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(57) Abstract 

Apparatus and methods for rendering 3D-graphics images. 
The apparatus include a port for receiving commands from a 
graphics application, an output for sending a rendered image to 
a display and a fragment-operations pipeline, coupled to the port 
and to the output, the pipeline including a stage for performing 
a fragment operation on a fragment on a per-pixel basis, as well 
as a stage for performing a fragment operation on the fragment 
on a per-sample basis. The stage for performing on a per-pixel 
basis is one of the following: a scissor-test stage, a stipple-test 
stage, an alpha-test stage or a color-test stage, and the stage 
for performing on a per-pixel basis is one of the following: a 
Z-test stage, a blending stage or a dithering stage. The apparatus 
programmatically selects whether to perform a stencil test on a 
per-pixel or a per-sample basis and performs the stencil test on the 
selected basis. The apparatus also programmatically selects pixel 
samples for per-sample operations, where the sample selections 
differ with different instances of the same per-sample operation. 
The apparatus also programmatically selects a set of subdivisions 
of a pixel as samples for use in the per-sample fragment operation, 
programmatically assigns different weights to at least two samples 
in the set and performs the per-sample fragment operation on 
the fragment, using the programmatically selected and differently 
weighted samples. 
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1 PATENT 



Apparatus and Method for Fragment Operations 
in a 3D-Graphics Pipeline 

9 

Cross-Reference to Related Applications 

This application claims the benefit under 35 USC Section 1 19(e) of U.S. 
13 Provisional Patent Application Serial No. 60/097,336, filed 20 August 1998 and entitled, 
"GRAPHICS PROCESSOR WITH DEFERRED SHADING" and claims the benefit under 
35 USC Section 120 of U.S. Patent Application Serial No. 09/213,990, filed 17 December 
1 998 entitled, "HOW TO DO TANGENT SPACE LIGHTING IN A DEFERRED 
1 7 SHADING ARCHITECTURE," each of which is hereby incorporated by reference. 

This application is also related to the following U.S. Patent Applications, each of 
which is incorporated herein by reference: 

Serial No. 09/213,990, filed 17 December 1998, entitled, "HOW TO DO 
2 1 TANGENT SPACE LIGHTING IN A DEFERRED SHADING ARCHITECTURE" (Atty. 
Doc. No. A-66397); 

Serial No. , filed 20 August 1999, entitled, "APPARATUS AND 

METHOD FOR PERFORMING SETUP OPERATIONS IN A 3-D GRAPHICS 
25 PIPELINE USING UNIFIED PRIMITIVE DESCRIPTORS" (Atty. Doc. No. A-66382); 

Serial No. , filed 20 August 1999, entitled, "POST-FILE SORTING 

SETUP" (Atty. Doc. No. A-66383); 

Serial No. , filed 20 August 1999, entitled, "TILE RELATIVE Y- 

29 VALUES AND SCREEN RELATIVE X-VALUES" (Atty. Doc. No. A-66384); 

Serial No. , filed 20 August 1999, entitled, "SYSTEM, APARATUS 

AND METHOD FOR SPATIALLY SORTING IMAGE DATA IN A THREE- 
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1 DIMENSIONAL GRAPHICS PIPELINE" (Atty. Doc. No. A-66380); 

Serial No. , filed 20 August 1999, entitled, "SYSTEM, APPARATUS 

AND METHOD FOR GENERATING GUARANTEED CONSERVATIVE MEMORY 
ESTIMATE FOR SORTING OBJECT GEOMETRY IN A THREE-DIMENSIONAL 
5 GRAPHICS PIPELINE" (Atty. Doc. No. A-6638 1); 

Serial No. , filed 20 August 1999, entitled, "SYSTEM, APPARATUS 

AND METHOD FOR BALANCING RENDERING RESOURCES IN A THREE- 
DIMENSIONAL GRAPHICS PIPELINE" (Atty. Doc. No. A-66379); 

9 Serial No. , filed 20 August 1 999, entitled, "GRAPHICS PROCESSOR 

WITH PIPELINE STATE STORAGE AND RETRIEVAL" (Atty. Doc. No. A-66378); 

Serial No. , filed 20 August 1999, entitled, "METHOD AND 

APPARATUS FOR GENERATING TEXTURE" (Atty. Doc. No. A-66398); 

1 3 Serial No. , filed 20 August 1 999, entitled, "APPARATUS AND 

METHOD FOR GEOMETRY OPERATIONS IN A 3D GRAPHICS PIPELINE" (Atty. 
Doc. No. A-66373); 

Serial No. , filed 20 August 1999, entitled, "APPARATUS AND 

1 7 METHOD FOR FRAGMENT OPERATIONS IN A 3D GRAPHICS PIPELINE" (Atty. 
Doc. No. A-66399); and 

Serial No. , filed 20 August 1999, entitled, "DEFERRED SHADING 

GRAPHICS PIPELINE PROCESSOR" (Atty. Doc. No. A-66360). 

21 

Field of the Invention 

This invention relates to high-performance 3-D graphics imaging. More 
particularly, the invention relates to per-fragment operations in a 3D-graphics pipeline. 

25 

Background 

Three-Dimensional Computer Graphics 

Computer graphics is the art and science of generating pictures with a 
29 computer. Generation of pictures, or images, is commonly called rendering. Generally, in 
three-dimensional (3D) computer graphics, geometry that represents surfaces (or volumes) 
of objects in a scene is translated into pixels stored in a framebuffer and then displayed on 
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1 a display device. 

In a 3D animation, a sequence of still images is displayed, giving the 
illusion of motion in three-dimensional space. Interactive 3D computer graphics allows a 
user to change his viewpoint or change the geometry in real-time, thereby requiring the 
5 rendering system to create new images on the fly in real time. 

In 3D computer graphics, each renderable object generally has its own local* 
object coordinate system and, therefore, needs to be translated (or transformed) from object 
coordinates to pixel-display coordinates. Conceptually, this translation is a four-step 
9 process: 1) translation from object coordinates to world coordinates, the coordinate system 
for the entire scene, 2) translation from world coordinates to eye coordinates, based on the 
viewing point of the scene, 3) translation from eye coordinates to perspective-translated 
eye coordinates and 4) translation from perspective-translated eye coordinates to pixel 
13 (screen) coordinates. These translation steps can be compressed into one or two steps by 
pre-computing appropriate translation matrices before any translation occurs. 

(Translation from object coordinates includes scaling for size enlargement 
or shrink. Perspective scaling makes farther objects appear smaller. Pixel coordinates are 
17 points in three-dimensional space in either screen precision (that is to say, pixels) or object 
precision (that is to say, high-precision numbers, usually floating-point). 

Once the geometry is in screen coordinates, it is broken into a set of pixel- 
color values (that is, "rasterized") that are stored into the framebuffer. 
21 A summary of the prior-art rendering process can be found in Watt, 

Fundamentals of Three-dimensional Computer Graphics (Addison- Wesley Publishing 
Company, 1989, reprinted 1991, ISBN 0-201-15442-0, herein "Watt" and incorporated by 
reference), particularly Chapter 5, "The Rendering Process," pages 97 to 1 13, and Foley et 
25 al., Computer Graphics: Principles and Practice . 2nd edition (Addison- Wesley Publishing 
Company, 1990, reprinted with corrections 1991, ISBN 0-201-121 10-7, herein "Foley et 
al." and incorporated by reference). 

FIG. 1 shows a three-dimensional object, a tetrahedron, with its own 
29 coordinate axes (x objcct> y objcct , z^*). The three-dimensional object is translated, scaled and 
placed in the viewing point's coordinate system based on (x^, y cye , z^). The object is 
projected onto the viewing plane, thereby correcting for perspective. At this point, the 
object appears to have become two-dimensional. The object's z-coordinates, however, are 

4 
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1 preserved for later use in hidden-surface removal. The object is finally translated to screen 
coordinates, based on (x^^ y,^, z^J, where is going perpendicularly into the 
page. Points on the object now have their x and y coordinates described by pixel location 
(and fractions thereof) within the display screen and their z coordinates in a scaled version 

5 of distance from the viewing point. 

Generic 3D-Graphics Pipeline 

Many hardware Tenderers have been developed. See, for example, Deering 
9 et al., "Leo: A System for Cost Effective 3D Shaded Graphics," SIGGRAPH93 

Proceedings, 1-6 August 1993, Computer Graphics Proceedings, Annual Conference Series 
(ACM SIGGRAPH, 1993, Soft-cover ISBN 0-201-58889-7 and CD-ROM ISBN 0-201- 
56997-3, herein "Deering et al." and incorporated by reference), particularly at pages 101 

13 to 108. Deering et al. includes a diagram of a generic 3D-graphics pipeline (that is to say, a 
Tenderer, or a rendering system) that it describes as "truly generic, as at the top level nearly 
every commercial 3D graphics accelerator fits this abstraction." This pipeline diagram is 
reproduced here as FIG. 6. (In this figure, the blocks with rounded corners typically 

1 7 represent functions or process operations, while sharp-cornered rectangles typically 
represent stored data or memory.) 

Such pipeline diagrams convey the process of rendering but do not describe 
any particular hardware. This document presents a new graphics pipeline that shares some 

2 1 of the steps of the generic 3D-graphics pipeline. Each of the steps in the generic 3D- 

graphics pipeline is briefly explained here. (Processing of polygons is assumed throughout 
this document, but other methods for describing 3D geometry could be substituted. For 
simplicity of explanation, triangles are used as the type of polygon in the described 

25 methods.) 

As seen in FIG. 6, the first step within the floating point-intensive functions 
of the generic 3D-graphics pipeline after the data input (step 612) is the transformation 
step (step 614), described above. The transformation step also includes "get next 
29 polygon." 

The second step, the clip test, checks the polygon to see if it is at least 
partially contained in the view volume (sometimes shaped as a frustum) (step 616). If the 
polygon is not in the view volume, it is discarded. Otherwise, processing continues. 



WO 00/11605 



PCT/US99/19363 



1 The third step is face determination, where polygons facing away from the 

viewing point are discarded (step 618). 

The fourth step, lighting computation, generally includes the set up for 
Gouraud shading and/or texture mapping with multiple light sources of various types but 
5 could also be set up for Phong shading or one of many other choices (step 622). 

The fifth step, clipping, deletes any portion of the polygon that is outside of * 
the view volume because that portion would not project within the rectangular area of the 
viewing plane (step 624). Generally, polygon clipping is done by splitting the polygon 
9 into two or more smaller polygons that both project within the area of the viewing plane. 
Polygon clipping is computationally expensive. 

The sixth step, perspective divide, does perspective correction for the 
projection of objects onto the viewing plane (step 626). At this point, the points 
13 representing vertices of polygons are converted to pixel-space coordinates by step seven, 
the screen space conversion step (step 628). 

The eighth step (step 632), set up for an incremental render, computes the 
various begin, end and increment values needed for edge walking and span interpolation 
1 7 (e.g.: x, y and z coordinates, RGB color, texture map space, u and v coordinates and the 
like). 

Within the drawing-intensive functions, edge walking (step 634) 
incrementally generates horizontal spans for each raster line of the display device by 

21 incrementing values from the previously generated span (in the same polygon), thereby 
"walking" vertically along opposite edges of the polygon. Similarly, span interpolation 
(step 636) 'Valks" horizontally along a span to generate pixel values, including a z- 
coordinate value indicating the pixel's distance from the viewing point. Finally, the z- 

25 buffered blending (also referred to as Testing and Blending) (step 638) generates a final 
pixel-color value. The pixel values include color values, which can be generated by simple 
Gouraud shading (that is to say, interpolation of vertex-color values) or by more 
computationally expensive techniques such as texture mapping (possibly using multiple 

29 texture maps blended together), Phong shading (that is to say, per-fragment lighting) and/or 
bump mapping (perturbing the interpolated surface normal). 

After drawing-intensive functions are completed, a double-buffered MUX 
output look-up table operation is performed (step 644). The generic 3D-graphics pipeline 
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includes a double-buffered framebuffer, so a double-buffered MUX is also included. An 
output lookup table is included for translating color-map values. 

By comparing the generated z-coordinate value to the corresponding value 
stored in the Z Buffer, the z-buffered blend either keeps the new pixel values (if it is closer 
to the viewing point than previously stored value for that pixel location) by writing it into 
the framebuffer or discards the new pixel values (if it is farther). 

At this step, antialiasing methods can blend the new pixel color with the old 
pixel color. The z-buffered blend generally includes most of the per- fragment operations, 
described below. 

Finally, digital to analog conversion makes an analog signal for input to the 

display device. 



1 3 Per-Fragment Operations 

In the generic 3D-graphics pipeline, the z-buffered-blend step actually 
incorporates many smaller per-fragment operational steps. 

Application Program Interfaces (APIs) define a set of per-fragment 

17 operations. Open Graphics Library (OpenGL), D3D, Performer, Inventor and B-Render 
are examples. A review of some exemplary OpenGL per-fragment operations follows so • 
that generic similarities and true differences between the inventive structures and methods 
and conventional structures and procedures can be more readily appreciated. The language 

21 of the OpenGL API is adopted, except as contraindicated herein. (See, for example, Open 
Architecture Review Board, OpenGL Reference Manual . 2nd edition (Addison-Wesley 
Developers Press, 1996) and OpenGL Architecture Review Board, OpenGL Programming 
Guide . 2nd edition (Addison-Wesley, 1997), both incorporated herein by reference. 

25 A framebuffer stores a set of pixels as a two-dimensional array. Each pixel 

stored in the framebuffer is a set of bits. The number of bits per pixel may vary depending 
on the particular implementation or context. An implementation may allow a choice in the 
selection of the number of bits per pixel, but within a context all pixels have the same 

29 number of bits. 

Corresponding bits from each pixel in the framebuffer form a bitplane. 
Each bitplane contains a single bit from each pixel. The bits at location (x, y) of all the 
bitplanes in the framebuffer constitute the single pixel (x, y). Groups of bitplanes form 

7 
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1 several logical buffers, namely, the color, depth, stencil and accumulation buffers. 

The color buffer, in turn, includes a front left, front right, back left, back 
right and some additional auxiliary buffers. The values stored in the front buffers are the 
values typically displayed on a display monitor while the contents of the back buffers and 
5 auxiliary buffers are invisible and not displayed. Stereoscopic contexts display both the 
front left and the front right buffers, while monoscopic contexts display only the front left ' 
buffer. In general, the color buffers must have the same number of bitp lanes, but particular 
implementations or context may not provide right buffers, back buffers or auxiliary buffers 
9 at all, and an implementation or context may additionally provide or not provide stencil, 
depth or accumulation buffers. 

The color buffers generally consist of unsigned-integer color indices (R, G, 
B) and, optionally, a number "A" of unsigned-integer value. The values, however, could 
1 3 be floating-point numbers or signed-integer values. The number of bitplanes in each of the 
color buffers, the depth buffer (if provided), the stencil buffer (if provided) and the 
accumulation buffer (if provided) is fixed on a per-context basis. If an accumulation buffer 
is provided, it has at least as many bitplanes per R, G and B color component as do the 
17 color buffers. 

A rasterization-produced fragment with window coordinates of (x wrNDOW , 
Ywindow) modifies the pixel in the framebuffer at those coordinates based on a number of 
tests, parameters and conditions. Among the several tests typically performed sequentially, 

21 beginning with a fragment and its associated data and finishing with a final output stream 
to the framebuffer, are (in the order performed, with some variation among APIs): pixel- 
ownership test, scissor test, alpha test, color test, stencil test, depth test, blending, dithering 
and logic operations. Each of these tests or operations is briefly described below. 

25 (OpenGL does not provide for an explicit color test between the alpha and 

stencil tests. OpenGL per-fragment operations are applied after all the color 
computations.) 
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1 

- Ownership Test 

The pixel-ownership test determines if the pixel at location (xvmoow* 
y window) in the framebuffer is currently owned by the graphics-language context. If it is 
5 not, the window system decides the fate of the incoming fragment. Possible results are that 
the fragment is discarded or that some subset of the subsequent per-fragment operations are 
applied to the fragment. Pixel ownership allows the window system to properly control 
the GL ! s behavior. 

9 Assume that in a computer having a display screen, one or several processes 

are running and that each process has a window on the display screen. For each process, 
the associated window defines the pixels to which the process wants to write or render. 
When there are two or more windows, the window associated with one process may be in 

1 3 front of the other window associated with another process, behind that other window or 
along with the other window entirely visible. Since there is only a single framebuffer for 
the entire display screen, the pixel-ownership test determines which process and associated 
window owns each of the pixels. If a particular process does not "own" a pixel, it fails the 

17 pixel-ownership test relative to the framebuffer, and that pixel is thrown away. 

Under the typical paradigm, the pixel-ownership test is run by each process. 
For a given pixel location in the framebuffer, that pixel passes the pixel-ownership test for 
at most one of the processes and fails the pixel-ownership test for all other processes. Only 

21 one process owns a particular framebuffer pixel at the same time. 

In some rendering schemes, the pixel-ownership test may not be particularly 
relevant. For example, if the scene is being rendered to an off-screen buffer and 
subsequently block transferred ("blitted") to the desktop, pixel ownership is not 

25 particularly relevant. Each pixel that a process tests automatically or necessarily passes the 
pixel-ownership test (if it is even executed) because each process effectively owns its own 
off-screen buffer and nothing is in front of that buffer. 

If for a particular process, the pixel is not owned by that process, writing a 

29 pixel value to that location is unnecessary. All subsequent processing for that pixel may be 
ignored. In a typical workstation, all the data associated with a particular pixel on the 
screen is read during rasterization. All information for any polygon that feeds that pixel is 
read, including information as to the identity of the process that owns that framebuffer 
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1 pixel, as well as the z-buffer, the color value, the old color value, the alpha value, stencil 
bits and so forth. 

If a process owns the pixel, then the other downstream processes are 
executed (for example, scissor test, alpha test and the like). 

5 

- Scissor Test 

The scissor test determines if (xw^dowj ywrnoow) li es within a scissor 
rectangle defined by four coordinate values corresponding to a left bottom (left, bottom) 
9 coordinate, a width of the rectangle and a height of the rectangle. (See, for example, the 
OpenGL procedure Scissor(left, bottom, width, height). If left ^ Xwimow < left+width and 
bottom << y window < bottom+height, then the scissor test passes. Otherwise, the scissor test 
fails, and the particular fragment being tested is discarded. 

1 3 In simple terms, a scissor rectangle defines a screen-aligned region. This 

scissor rectangle is useful in that only pixels from a polygon that fall in that screen-aligned 
scissor rectangle change. In the event that a polygon straddles the scissor rectangle, only 
those pixels that are inside the rectangle may change. An implementation may allow more 

1 7 than one scissor rectangle. A scissor rectangle list can be used for rendering to a window 
that is partially obscured such that the visible portion of the window consists of more than 
one rectangular region. 

Just as with the pixel-ownership test, the scissor test provides means for 

2 1 discarding pixels and/or fragments before they actually get to the framebuffer to cause the 
output to change. 

When a polygon comes down the pipeline, the pipeline calculates 
everything it needs to determine the z- value and color of that pixel. Once z value and color 

25 are determined, that information helps to determine what information is placed in the 
framebuffer, thereby determining what is on the display screen. 

- Stipple Test 

29 The stipple test uses a 32x32-bit window-aligned stipple pattern. The stipple 

pattern is a mask of 0s and 1 s. The stipple pattern is tiled on the window. The stipple test 
passes if the bit in the stipple pattern at (Xw^dqw ywiNDow) is set, i.e. is 1. Otherwise, the 
stipple test fails, and the particular fragment being tested is discarded. 

10 



WO 00/11605 



PCT/US99/19363 



1 - Alpha Test 

Color is defined by four values, red (R), green (G), blue (B) and alpha (A). 
The RGB values define the contribution from each of the primary colors, and alpha is 
related to the transparency. Typically, color is a 32-bit value, 8-bits for each component, 

5 though such representation is not limited to 32-bits. The alpha test compares the alpha 
value of a given pixel to an alpha-reference value. Any pixel not passing the alpha test is 
thrown away or otherwise discarded. 

The type of comparison may also be specified. For example, the 

9 comparison may be a greater-than operation, a less-than operation and so forth. If the 
comparison is a greater-than operation, then the pixel's alpha value has to be greater than 
the reference to pass the alpha test. So if the pixel's alpha value is 0.9, the reference alpha 
is 0.8 and the comparison is greater-than, then that pixel passes the alpha test. 
13 The alpha test is a per- fragment operation and happens after all of the 

fragment coloring calculations and lighting and shading operations are completed. Each of 
these per- fragment operations may be thought of as part of the conventional z-buffer 
blending operations. 

17 

- Color Test 

The color test is similar to the alpha test described hereinbefore, except that 
rather than performing the magnitude or logical comparisons between the pixel alpha (A) 

2 1 value and a reference value, the color test performs a magnitude or logical comparison 
between one or a combination of the R, G or B color components and reference value(s). 
Although for the alpha test, one typically has one value for each component, for the color 
test there are effectively two values per component, a maximum value and a minimum 

25 value. 

The comparison test may be, for example, greater-than, less-than, equal-to, 
greater-than-or-equal-to, "greater-than-C! and less- than Cj," where c, and C2 are 
predetermined reference values, and so forth. One might, for example, specify a reference 
29 minimum R value and a reference maximum R value, such that the color test passes only if 
the pixel R value is between that minimum and maximum. The color test might be useful 
to provide blue-screen functionality, for example. 
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1 

- Stencil Test 

The stencil test conditionally discards a fragment based on the outcome of a 
comparison between a value stored in a stencil buffer at location (^wmuo^, Ywindow) and a 
5 reference value. If the stencil test fails, the incoming fragment is discarded, although the 
corresponding stencil buffer value may be modified in accordance with the specified 
stencil operation to be carried out on failing the stencil test. 

When an object is rendered into the framebuffer, a tag having the stencil bits 
9 is also written into the framebuffer. These stencil bits are part of the pipeline state. The 
type of the stencil test to perform can be specified at the time the geometry is rendered. 

The stencil bits are used to implement various filtering, masking or 
stenciling operations. For example, if a particular fragment ends up affecting a particular 
13 pixel in the framebuffer, then the stencil bits can be written to the framebuffer along with 
the pixel information. 

Several stencil comparison functions are permitted such that the stencil test 
passes never, always or if the reference value is less than, less than or equal to, equal to, 
1 7 greater than or equal to, greater than, or not equal to the masked stored value in the stencil 
buffer. 

The reference value and the comparison value can have multiple bits, 
typically 8 bits so that 256 different values may be represented. 

21 

- Depth-Buffer Test 

The depth-buffer test discards the incoming fragment if a depth comparison 
fails. The comparison is programmatically enabled or disabled. When the depth test is 

25 disabled, the depth comparison and subsequent possible updates to the depth-buffer value 
are bypassed, and a fragment is passed to the next operation. The stencil bits are also 
involved and may be modified even if the test is bypassed. In this case, the stencil value is 
modified as if the depth-buffer test passed. 

29 If the depth test is enabled, the depth comparison takes place and the depth 

buffer arid stencil value may subsequently be modified. 

Depth comparisons are implemented in which possible outcomes are as 
follows: the depth-buffer test passes never, always or if the incoming fragments z wrNDOW 
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1 value is less than, less than or equal to, equal to, greater than, greater than or equal to, or 
not equal to the depth value stored at the location given by the incoming fragment's 
( x window» Ywindow) coordinates. If the depth-buffer test fails, the incoming fragment is 
discarded. The stencil value at the fragment's (x WIND0W , Ywindow) coordinate is updated 

5 according to the function currently in effect for depth-buffer test failure. Otherwise, the 
fragment continues to the next operation and the value of the depth buffer at the fragment's * 
( x window» ywiNDow) location is set to the fragment's Zwindow value. In this case the stencil 
value is updated according to the function currently in effect for depth-buffer test success. 

9 

- Blending 

Blending combines the incoming fragment's R, G, B and A values with the 
R, G, B and A values stored in the framebuffer at the incoming fragment's 

13 (xwjndowj Ywindow) location. This blending is typically dependent on the incoming 

fragment's alpha value (A) and that of the corresponding framebuffer stored pixel. (In the 
following discussion, "Cs" refers to the source color for an incoming fragment, "Cd" refers 
to the destination color at the corresponding framebuffer location, and "Cc" refers to a 

17 constant color in-the GL state. Subscripts of 's,* 'd' and V respectively denote individual 
RGBA components of these colors.) 

Generally speaking, blending is an operation that takes color in the 
framebuffer and the color in the fragment and blends them together. The manner in which 

21 blending is achieved, that is, the particular blending function, may be selected from various 
alternatives for both the source and destination. 

For example, an additive-type blend is available wherein a blend result (C) 
is obtained by adding the product of a source color (Cs) by a source weighting-factor 

25 quadruplet (S) to the product of a destination color (Cd) and a destination weighting-factor 
quadruplet (D), that is, C = C $ S + C d D. Alternatively, the blend equation may be a 
subtraction (C = C S S - C d D), a reverse subtraction, (C = C d D - C S S), a minimum function, 
(C = min(C s , CJ), or a maximum function, (C = max(Cs, Cj)). The blending equation is 

29 evaluated separately for each color component and its corresponding weighting coefficient. 
Each of the four R, G, B, A components has its own weighting factor. 

The blending test (or blending equation) is part of the pipeline state and can 
potentially change for every polygon but, more typically, changes only for an object made 
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1 up of several polygons. 

In general, blending is performed only after other tests such as the pixel- 
ownership test and stencil test have passed. Then it is clear that the pixel or fragment 
under consideration would or could have an effect in the output. 

5 

- Dithering 

Dithering selects between two color values or indices. In RGBA mode, the 
value of any of the color components is essentially a fixed-point value, c, with m bits to 
9 the left of the binary point, where m is the number of bits allocated to that component in 
the framebuffer. For each c, dithering selects a value c' such that c'e {max{0, Ceiling(c)- 
1 } , Ceiling(c)} . In color index mode, the same rule applies with c being a single-color 
index. This selection may depend on the Xwndow and ywiNDow coordinates of the pixel. 
13 (The value of c cannot be larger than the maximum value representable in the framebuffer 
for the color component.) 

Although many dithering algorithms are possible, a dithered value produced 
by any algorithm generally depends on only the incoming value and the fragment's x and y 
17 window coordinates. When dithering is disabled, each color component is truncated to a 
fixed-point value with as many bits as there are in the corresponding framebuffer 
component. 

21 - Logical Operations 

A final logical operation applies between the incoming fragment's color or 
index values and the color or index values stored in the framebuffer at the corresponding 
location. The result of the logical operation replaces the values in the framebuffer at the 

25 fragment's (x, y) coordinates. Various logical operations may be implemented between 
source (s) and destination (d), including for example: CLEAR, SET, AND, NOOP, XOR, 
OR, NOR, NAND, INVERT, COPY, INVERTED AND, EQUIVALENCE, REVERSE 
OR, REVERSE AND, INVERTED COPY and INVERTED OR. Logical operations are 

29 performed independently for each color-index buffer that is selected for writing or for each 
red, green, blue and alpha value of each color buffer that is selected for writing. 
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1 

Antialiasing 

Pixels are the smallest individually controllable element of the display 
device. However, with images quantized into discrete pixels, spatial aliasing occurs. A 
5 typical aliasing artifact is a "staircase" effect caused when a straight line or edge cuts 
diagonally across rows of pixels. 

Some rendering systems reduce aliasing effects by dividing pixels into sub- 
pixels, where each sub-pixel can be colored independently. When the image is to be 
9 displayed, the colors for all sub-pixels within each pixel are blended together to form an 
average color for the pixel. A Tenderer that uses up to 16 sub-pixels per pixel is described 
in Akeley, "RealityEngine Graphics," SIGGRAPH93 Proceedings, 1-6 August 1993, 
Computer Graphics Proceedings, Annual Conference Series, pages 109 to 1 16 (ACM 
13 SIGGRAPH, New York, 1993, Softcover ISBN 0-201-58889-7 and CD-ROM ISBN 0- 
201-56997-3, herein "Akeley" and incorporated by reference). 

Carpenter, 'The A-buffer, an Antialiased Hidden Surface Method," 

SIGGRAPH 1984 Conference Proceedings, pp.103-108 ( , 19_, herein "Carpenter" 

1 7 and incorporated by reference), describes another prior-art antialiasing method, the A- 
Buffer method. (Akeley also describes this technique.) The A-buffer is an antialiasing 
technique that reduces aliasing by keeping track of the percent coverage of a pixel by a 
rendered polygon. 

21 The sub-pixel antialiasing approach is not without its problems. Assuming 

each pixel is divided into an n*m number of sub-pixels, some, if not all, of computations in 
the fragment-operations pipeline increase in number by a factor of n*m. 

A counter approach to the n*m sub-pixels is the use of samples. Given n*m 

25 sub-pixels per pixel, prior-art fragment-operations pipelines select a fixed number H of 
these n*m sub-pixels from H fixed locations to represent the entire pixel. The fragment 
operations are applied to the H samples. At the end of the pipeline, each of the H samples 
is given the same weight in re-creating the pixel. 

29 Additionally, all of the per- fragment operations of prior-art fragment- 

operations pipelines are done on a per-pixel basis where samples and sub-pixels have not 
been implemented. Where sub-pixels or samples or pixels are implemented, all of the pre- 
fragment operations are done on a respective per-sub-pixel or per-sample basis. 
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1 However, fixing the number, location and weight of samples restricts the 

flexibility of the fragment-operations pipeline and decreases the effectiveness of its 
techniques. Likewise, performing all fragment operations on a per-pixel, per-sub-pixel or 
per-sample basis restricts the flexibility of the fragment-operations pipeline and decreases 

5 its effectiveness. 

The main drawback to the A-buffer technique is the need to sort polygons 
front-to-back (or back-to-front) at each pixel in order to get acceptable antialiased 
polygons. 

9 Accordingly, there is a need for a multi-dimensionally flexible per-fragment 

pipeline. There is always a need for an antialiasing method that improves on the rendered 
image. 

These and other goals of the invention will be readily apparent to one of 
13 skill in the art on reading the background above and the description below. 

Summary 

Herein are described apparatus and methods for rendering 3D-graphics 
1 7 images with and without anti-aliasing. In one embodiment, the apparatus include a port for 
receiving commands from a graphics application, an output for sending a rendered image 
to a display and a fragment-operations pipeline, coupled to the port and to the output, the 
pipeline including a stage for performing a fragment operation on a fragment on a per-pixel 
21 basis, as well as a stage for performing a fragment operation on the fragment on a per- 
sample basis. 

In one embodiment, the stage for performing on a per-pixel basis is one of 
the following: a scissor-test stage, a stipple-test stage, an alpha-test stage or a color-test 
25 stage. The stage for performing on a per-sample basis is one of the following: a Z-test 
stage, a blending stage or a dithering stage. 

In another embodiment, the apparatus programmatically selects whether to 
perform a stencil test on a per-pixel or a per-sample basis and performs the stencil test on 
29 the selected basis. 

In another embodiment, the apparatus programmatically selects a set of 
subdivisions of a pixel as samples for use in the per-sample fragment operation and 
performs the per-sample fragment operation, using the programmatically selected samples. 
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1 In another embodiment, the apparatus programmatically allows primitive 

based anti-aliasing, i.e. the anti-aliasing maybe turned on or off on a per-primitive basis. 

In another embodiment, the apparatus programmatically performs several 
passes through the geometry. The apparatus selects the first set of subdivisions of a pixel 
5 as samples for use in the per-sample fragment operation and performs the per-sample 
fragment operation, using the programmatically selected samples. It then 
programmatically selects a different set of the pixel subdivisions as samples for use in a 
second per-sample fragment operation and then performs the second per-sample fragment 
9 operation, using the programmatically selected samples. 

The color values resulting from the second pass are accumulated with the 
color values from the first pass. Several passes can be performed to effectively increase the 
number of samples per pixel. The sample locations for each pass are different and the pixel 
13 color values are accumulated with the results of the previous passes. 

The apparatus programmatically selects a set of subdivisions of a pixel as 
samples for use in the per-sample fragment operation, programmatically assigns weights to 
the samples in the set and performs the per-sample fragment operation on the fragment. 
1 7 The apparatus programmatically determines the method for combining the color values of 
the samples in a pixel to obtain the resulting color in the framebuffer at the pixel location. 
In addition, the apparatus programmatically selects the depth value assigned to a pixel in 
the depth buffer from the depth values of all the samples in the pixel. 
2 1 The apparatus includes a method to clear the color, depth, and stencil 

buffers partially or fully, without a read-modify-write operation on the framebuffer. 

The apparatus includes a method for considering per-pixel depth values 
assigned to the polygon as well as the depth values interpolated from those specified at the 
25 vertices of the polygon. 

The apparatus includes a method for considering per-pixel stencil values 
assigned to the polygon in stencil test, as well as the specified stencil reference value of the 
polygon. 

29 The apparatus includes a method for determining if any pixel in the scene is 

visible on the screen without updating the color buffer. 
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1 

Brief Description of the Drawings 

FIG. 1 shows a three-dimensional object, a tetrahedron, in various 
coordinate systems. 

5 FIG. 2 is a block diagram illustrating the components and data flow in the 

pixel block. 

FIG. 3 is a high-level block diagram illustrating the components and data 
flow in a 3D-graphics pipeline incorporating the invention. 
9 FIG. 4 illustrates the relationship of samples to pixels and stamps and the 

default sample grid, count and locations according to one embodiment. 
FIG. 5 is a block diagram of the pixel-out unit. 

FIG. 6 is a reproduction of the Deering et al. generic 3D-graphics pipeline. 
13 FIG. 7 is a method-flow diagram of the pipeline of FIG. 3. 

FIG. 8 illustrates a system for rendering three-dimensional graphics images. 
FIG. 9 shows an example of how the cull block produces fragments from a 
partially obscured triangle. 
1 7 FIG. 10 demonstrates how the pixel block processes a stamp's worth of 

fragments. 

FIGS. 11 and 12 are alternative embodiments of a 3D-graphics pipeline 
incorporating the invention. 

21 

Description of Specific Embodiments 
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25 ABBREVIATIONS 

Following are abbreviations which may appear in this description, along 
with their expanded meaning: 

BKE: the back-end block 84C. 
29 CUL: the cull unit 846. 

MU: the mode-injection unit 847. 
PHG: the Phong unit 84A. 
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PIX: the pixel block 84B. 
PXO: the pixel-out unit 280. 
SRT: the sort unit 844. 
TEX: the texture unit 849. 
VSP: a visible stamp portion. 



Overview 

- The Rendering System 
9 FIG. 8 illustrates a system 800 for rendering three-dimensional graphics 

images. The rendering system 800 includes one or more of each of the following: data- 
processing units (CPUs) 810, memory 820, a user interface 830, a co-processor 840 such as 
a graphics processor, communication interface 850 and communications bus 860. 

1 3 Of course, in an embedded system, some of these components may be 

missing, as is well understood in the art of embedded systems. In a distributed computing 
environment, some of these components may be on separate physical machines, as is well 
understood in the art of distributed computing. 

17 The memory 820 typically includes high-speed, volatile random-access 

memory (RAM), as well as non- volatile memory such as read-only memory (ROM) and 
magnetic disk drives. Further, the memory 820 typically contains software 821. The 
software 821 is layered: Application software 8211 communicates with the operating 

21 system 8212, and the operating system 8212 communicates with the I/O subsystem 8213. 
The I/O subsystem 8213 communicates with the user interface 830, the co-processor 840 
and the communications interface 850 by means of the communications bus 860. 
The user interface 830 includes a display monitor 831. 

25 The communications bus 860 communicatively interconnects the CPU 810, 

memory 820, user interface 830, graphics processor 840 and communication interface 850. 

The memory 820 may include spatially addressable memory (SAM). A 
SAM allows spatially sorted data stored in the SAM to be retrieved by its spatial 

29 coordinates rather than by its address in memory. A single SAM query operation can 
identify all of the data within a specified spatial volume, performing a large number of 

arithmetic comparisons in a single clock cycle. , U.S. Patent No. 4,996,666, " 

(19 ) further describes SAMs and is incorporated herein by reference. 
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1 The address space of the co-processor 840 may overlap, be adjacent to 

and/or disjoint from the address space of the memory 820, as is well understood in the art 
of memory mapping. If, for example, the CPU 810 writes to an accelerated graphics port at 
a predetermined address and the graphics co-processor 840 reads at that same 
5 predetermined address, then the CPU 810 can be said to be writing to a graphics port and 
the graphics processor 840 to be reading from such a graphics port. 

The graphics processor 840 is implemented as a graphics pipeline, this 
pipeline itself possibly containing one or more pipelines. FIG. 3 is a high-level block 
9 diagram illustrating the components and data flow in a 3D-graphics pipeline 840 

incorporating the invention. The 3D-graphics pipeline 840 includes a command-fetch-and- 
decode block 841, a geometry block 842, a mode-extraction block 843, a sort block 844, a 
setup block 845, a cull block 846, a mode-injection block 847, a fragment block 848, a 
1 3 texture block 849, a Phong block 84A, a pixel block 84B, a back-end block 84C and sort, 
polygon, texture and framebuffer memories 84D, 84E, 84F, 84G. The memories 84D, 
84E, 84F, 84G may be a part of the memory 820. 

FIG. 7 is a method-flow diagram of the pipeline of FIG. 3. FIGS. 11 and 
17 1 2 are alternative embodiments of a 3D-graphics pipeline incorporating the invention. 

The command-fetch-and-decode block 841 handles communication with the 
host computer through the graphics port. It converts its input into a series of packets, 
which it passes to the geometry block 842. Most of the input stream consists of 
21 geometrical data, that is to say, lines, points and polygons. The descriptions of these 
geometrical objects can include colors, surface normals, texture coordinates and so on. 
The input stream also contains rendering information such as lighting, blending modes and 
buffer functions. 

25 The geometry block 842 handles four major tasks: transformations, 

decompositions of all polygons into triangles, clipping and per-vertex lighting calculations 
for Gouraud shading. 

The geometry block 842 transforms incoming graphics primitives into a 

29 uniform coordinate space ("world space'*). It then clips the primitives to the viewing 
volume ("frustum"). In addition to the six planes that define the viewing volume (left, 
right, top, bottom, front and back), the Subsystem provides six user-definable clipping 
planes. After clipping, the geometry block 842 breaks polygons with more than three 
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1 vertices into sets of triangles to simplify processing. 

Finally, if there is any Gouraud shading in the frame, the geometry block 
842 calculates the vertex colors that the fragment block 848 uses to perform the shading. 

The mode-extraction block 843 separates the data stream into two parts: 
5 vertices and everything else. Vertices are sent to the sort block 844. Everything else 
(lights, colors, texture coordinates, etc.), it stores in the polygon memory 84E, whence it 
can be retrieved by the mode-injection block 847. The polygon memory 84E is double 
buffered, so the mode-injection block 847 can read data for one frame while the mode- 
9 extraction block 843 is storing data for the next frame. 

The mode data stored in the polygon memory falls into three major 
categories: per-frame data (such as lighting), per-primitive data (such as material 
properties) and per-vertex data (such as color). The mode-extraction and mode-injection 
13 blocks 843, 847 further divide these categories to optimize efficiency. 

For each vertex, the mode-extraction block 843 sends the sort block 844 a 
packet containing the vertex data and a pointer (the "color pointer 1 ') into the polygon 
memory 84E. The packet also contains fields indicating whether the vertex represents a 
1 7 point, the endpoint of a line or the corner of a triangle. The vertices are sent in a strictly 
time-sequential order, the same order in which they were fed into the pipeline. The packet 
also specifies whether the current vertex forms the last one in a given primitive, that is to 
say, whether it completes the primitive. In the case of triangle strips ("fans") and line 
21 strips ("loops"), the vertices are shared between adjacent primitives. In this case, the 
packets indicate how to identify the other vertices in each primitive. 

The sort block 844 receives vertices from the mode-extraction block 843 
and sorts the resulting points, lines and triangles by tile. (A tile is a data structure 
25 described further below.) In the double-buffered sort memory 84D, the sort block 844 
maintains a list of vertices representing the graphic primitives and a set of tile pointer lists, 
one list for each tile in the frame. When the sort block 844 receives a vertex that completes 
a primitive, it checks to see which tiles the primitive touches. For each tile a primitive 
29 touches, the sort block adds a pointer to the vertex to that tile's tile pointer list. 

When the sort block 844 has finished sorting all the geometry in a frame, it 
sends the data to the setup block 845. Each sort-block output packet represents a complete 
primitive. The sort block 844 sends its output in tile-by-tile order: all of the primitives that 
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1 touch a given tile, then all of the primitives that touch the next tile, and so on. Thus, the 
sort block 844 may send the same primitive many times, once for each tile it touches. 

The setup block 845 calculates spatial derivatives for lines and triangles. 
The block 845 processes one tile's worth of data, one primitive at a time. When the block 
5 845 is done, it sends the data on to the cull block 846. 

The setup block 845 also breaks stippled lines into separate line segments 
(each a rectangular region) and computes the minimum z value for each primitive within 
the tile. 

9 Each packet output from the setup block 845 represents one primitive: a 

triangle, line segment or point. 

The cull block 846 accepts data one tile's worth at a time and divides its 
processing into two steps: SAM culling and sub-pixel culling. The SAM cull discards 
13 primitives that are hidden completely by previously processed geometry. The sub-pixel 
cull takes the remaining primitives (which are partly or entirely visible) and determines the 
visible fragments. The sub-pixel cull outputs one stamp's worth of fragments at a time, 
herein a "visible stamp portion/' (A stamp is a data structure described further below.) 
1 7 FIG. 9 shows an example of how the cull block 846 produces fragments 

from a partially obscured triangle. A visible stamp portion produced by the cull block 846 
contains fragments from only a single primitive, even if multiple primitives touch the 
stamp. Therefore, in the diagram, the output VSP contains fragments from only the gray 
2 1 triangle. The fragment formed by the tip of the white triangle is sent in a separate VSP, 
and the colors of the two VSPs are combined later in the pixel block 84B. 

Each pixel in a VSP is divided into a number of samples to determine how 
much of the pixel is covered by a given fragment. The pixel block 84B uses this 
25 information when it blends the fragments to produce the final color of the pixel. 

The mode-injection block 847 retrieves block-mode information (colors, 
material properties, etc.) from the polygon memory 84E and passes it downstream as 
required. To save bandwidth, the individual downstream blocks cache recently used mode 
29 information. The mode-injection block 847 keeps track of what information is cached 
downstream and only sends information as necessary. 

The main work of the fragment block 848 is interpolation. The block 848 
interpolates color values for Gouraud shading, surface normals for Phong shading and 
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1 texture coordinates for texture mapping. It also interpolates surface tangents for use in the 
bump-mapping algorithm if bump maps are in use. 

The fragment block 848 performs perspective-corrected interpolation using 
barycentric coefficients. 
5 The texture block 849 applies texture maps to the pixel fragments. Texture 

maps are stored in the texture memory 84F. Unlike the other memory stores described 
previously, the texture memory 84F is single buffered. It is loaded from the memory 820 
using the graphics port interface. 
9 Textures are mip-mapped. That is to say, each texture comprises a series of 

texture maps at different levels of detail, each map representing the appearance of the 
texture at a given distance from the eye point. To reproduce a texture value for a given 
pixel fragment, the text block 849 performs tri-linear interpolation from the texture maps, 
13 to approximate the correct level of detail. The texture block 849 also performs other 
interpolation methods, such as anisotropic interpolation. 

The texture block 849 supplies interpolated texture values (generally as 
RGBA color values) to the Phong block 84A on a per-fragment basis. Bump maps 
17 represent a special kind of texture map. Instead of a color, each texel of a bump map 
contains a height field gradient. 

The Phong block 84A performs Phong shading for each pixel fragment. It 
uses the material and lighting information supplied by the mode-injection block 847, the 
21 texture colors from the texture block 849 and the surface normal generated by the fragment 
block 848 to determine the fragment's apparent color. If bump mapping is in use, the 
Phong block 847 uses the interpolated height field gradient from the texture block 849 to 
perturb the fragment's surface normal before shading. 
25 The pixel block 84B receives VSPs, where each fragment has an 

independent color value. The pixel bock 84B performs a scissor test, an alpha test, stencil 
operations, a depth test, blending, dithering and logic operations on each sample in each 
pixel. When the pixel block 84B has accumulated a tile's worth of finished pixels, it 
29 combines the samples within each pixel (thereby performing antialiasing of pixels) and 
sends then to the back end 84C for storage in the framebuffer 84G. 

FIG. 10 shows a simple example of how the pixel block 84B may process a 
stamp's worth of fragments. In this example, the pixel block receives two VSPs, one from 
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1 a gray triangle and one from a white triangle. It then blends the fragments and the 

background color to produce the final pixels. In this example, the block 84B weights each 
fragment according to how much of the pixel it covers or, to be more precise, by the 
number of samples it covers. As mentioned before, this is a simple example. The apparatus 

5 performs much more complex blending. 

(The pixel-ownership test is a part of the window system and is left to the 
back end 84C.) 

The back-end block 84C receives a tile's worth of pixels at a time from the 
9 pixel block 84B and stores them into the framebuffer 84G. The back end 84C also sends a 
tile's worth of pixels back to the pixel block 84B because specific framebuffer values can 
survive from frame to frame. For example, stencil-bit values can remain constant over 
many frames but can be used in all of those frames. 
1 3 In addition to controlling the framebuffer 84G, the back-end block 84C 

performs pixel-ownership tests, 2D drawing and sends the finished frame to the output 
devices. The block 84C provides the interface between the framebuffer 84G and the 
monitor 831 and video output. 

17 

- The Pixel Block 

The pixel block 84B is the last block before the back end 84C in the 3D 
pipeline 840. It is responsible for performing per-fragment operations. In addition, the 
21 pixel block 84B performs sample accumulation for anti-aliasing. 

The pipeline stages before the pixel block 84B convert primitives into 
VSPs. The sort block 844 collects the primitives for each tile. The cull block 846 receives 
the data from the sort block in tile order and culls out parts of the primitives that do not 
25 contribute to the rendered images. The cull block 846 generates the VSPs. The texture and 
the Phong block units 849, 84A also receive the VSPs and are responsible for the texturing 
and lighting of the fragments, respectively. 

FIG. 2 is a block diagram illustrating the components and data flow in the 
29 pixel block 84B. The block 84B includes FIFOs 210, an input filter 220 and queues 230, 
240. The pixel block 84B also includes an input processor 290, caches 260, 270 and a 
depth-interpolation unit 2L0. Also in pixel block 84B is a 3D pipeline 2M0 including 
scissor-, stipple-, alpha-, color- and stencil/Z-test units 2A0, 2B0, 2C0, 2D0, 2E0, as well 
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1 as blending, dithering and logical-operations units 2F0, 2G0, 2H0. Per-sample stencil and 
z buffers 210, per-sample color buffers 2J0, the pixel-out unit 280 and the per-pixel tile 
buffers 2K0 also help compose the pixel block 84B. 

In FIG- 2, the input FIFOs 210a and 201b receive inputs from the Phong 
5 block 847 and the mode-injection block 847, respectively. The input FIFO 210a outputs to 
the color queue 230, while the input FIFO 210b outputs to the input filter 220. 

The input filter outputs to the pixel-out unit 280, the back-end block 84C 
and the VSP queue 240. 
9 The input processor 290 receives inputs from the queues 230, 240 and 

outputs to the stipple and mode caches 260, 270, as well as to the depth-interpolation unit 
2L0 and the 3D pipeline 2M0. 

The first stage of the pipeline 2M0, the scissor-test unit 2A0, receives input 
13 from the input processor 290 and outputs to the stipple-test unit 2B0. The unit 2B0 outputs 
to the alpha-test unit 2C0, which outputs to the color-test unit, which outputs to the 
stencil/z-test unit 2E0, which outputs to the blending/dithering unit 2F0. The stencil/z-test 
unit 2E0 also communicates with the per-sample z and stencil buffers 210, while the 
17 blending/dithering unit 2F0 and the logical-operations unit 2H0 both communicate with 
the per-sample color buffers 2J0. 

The components of the pipeline 2M0, the scissor-, stipple-, alpha-, color- 
and stencil/Z-test units 2A0, 2B0, 2C0, 2D0, 2E0 and the blending, dithering and logical- 
21 operations units 2F0, 2G0, 2H0 all receive input from the stipple and mode caches 260, 
270. The stencil/Z-test unit 2E0 also receives inputs from the depth-interpolation unit 2L0. 

Towards the back-end side, the pixel-out unit 280 communicates with the 
per-sample z, stencil and color buffers 210, 2J0 as well as with the per-pixel buffers 2K0. 
25 The per-pixel buffers 2K0 and the back-end block 84C are in communication. 

As mentioned above, the pixel block 84B communicates with the Phong, 
mode-injection and back-end blocks 847, 84A, 84C. More particularly, the pixel block 
84B receives input from the mode-injection and Phong blocks 847, 84A. The pixel block 
29 84B receives VSPs and mode data from the mode-injection block 847 and receives 

fragment colors for the VSPs from the Phong block 84A. (The Phong block 84A may also 
supply per-fragment depth or stencil values for VSPs.) The fragment colors for the VSPs 
arrive at the pixel block 84B in the same order as the VSPs. 
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1 The pixel block 84B processes the data for each visible sample according to 

maintained mode settings. When the pixel block 84B finishes processing all stamps for the 
current tile, it signals the pixel-out unit 280 to output the color, z and stencil buffers for the 
tile. 

5 The pixel-out unit 280 processes the pixel samples to generate color, z and 

stencil values for the pixels. These pixel values are sent to the back-end block 84C which 
has the memory controller for the framebuffer 84G. The back-end block 84C prepares the 
current tile buffers for rendering of geometry (VSPs) by the pixel block 84B. This may 

9 involve loading of the existing color, z C, and stencil values from the framebuffer 84G. 

In one embodiment, the on-chip per-sample z, stencil and color buffers 210, 
2J0 are double buffered. Thus, while the pixel-out unit 280 is sending one tile to the back- 
end block 84C, the depth and blend units 2E0, 2F0 can write to a second tile. The per- 
1 3 sample color, z- and stencil buffers 210, 2J0 are large enough to store one tile's worth of 
data. 

There is also a set of per-pixel z, stencil and color buffers 2K0 for each tile. 
These per-pixel buffers 2K0 are an intermediate storage interfacing with the back-end 
17 block 84C. 

The pixel block 84B also receives some packets bound for the back-end 
block 84C from the mode-injection block 847. The input filter 220 appropriately passes 
these packets on to (the prefetch queue of) the back end 84C, where they are processed in 
21 the order received. Some packets are also sent to (the input queue in) the pixel-out unit 
280. 

As mentioned before, the pixel block 84B receives input from the mode- 
injection and Phong blocks 847 and 84A. There are two input queues to handle these two 

25 inputs. The data packets from the mode-injection block 847 go to the VSP queue 240 and 
the fragment color (and depth or stencil if enabled) packets from the Phong block 84 A go 
to the color queue 230. The mode-injection block 847 places the data packets in the input 
FIFO 210. The input filter 220 examines the packet header and sends the data bound for 

29 the back-end block 84C to the back-end block 84C and the data packets needed by the 
pixel block 84B to the VSP queue 240. The majority of the packets received from the 
mode-injection block 847 are bound for the VSP queue 240, some go only to the back-end 
block 84C and some are copied into the VSP queue 240 as well as sent to the back-end and 
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1 the pixel-out units 84C, 280. 

A brief explanation of the need and mechanism for tile preparation follows. 
A typical rendering sequence may have the following operations: (1) initialize the color, z 
and stencil buffers 2 JO, 210 to their clear values, if needed, (2) bit background image(s) 

5 into the buffer(s) 2J0, 210, if needed, (3) render geometry, (4) bit again, (5) render some 
more geometry, (6) complete and flip. If the bit operation (2) covers the entire window, a 
clearing operation for that buffer may not be needed. If the bit covers the partial window, a 
clear may be needed. Furthermore, the initialization and bit (2) operations may happen in 

9 reverse order. That is to say, there may be a bit to (perhaps) the whole window followed 
by a clearing of a part of the window. The pre-geometry bits that cover the entire window 
do not require a scissor test. Tile alignment and scaling may be carried out by the back-end 
block 84C as image read back into the tile buffers. The post-geometry bits and the bits that 
13 cover part of the window or involve scaling are implemented as textured primitives in the 
pipeline. 

Similarly, the clear operation is broken into two kinds. The pre-geometry 
entire-window-clear operation is carried out in the pixel-out unit 280, and the clear 

1 7 operation that covers only part of the window (and/or is issued after some geometry has 
been rendered) is carried out in the pixel-block pipeline. Both the pixel block 84B (the 
pixel-out unit 280) and the back-end block 84C are aware of the write masks for various 
buffers at the time the operation is invoked. In fact, the back-end block 84C uses the write 

21 masks to determine if it needs to read back the tile buffers. The readback of tile buffers 
may also arise when the rendering of a frame causes the polygon or sort memory 84E, 84D 
to overflow. 

In some special cases, the pipeline may break a user frame into two or more 
25 sequential frames. This may happen due to a context switch or due to polygon or sort 
memory 84E, 84D to overflow. Thus, for the same user frame, a tile may be visited more 
than once in the pixel block 84B. The first time a tile is encountered, the pixel block 84B 
(most likely the pixel-out unit 280) may need to clear the tile buffers 210, 2J0 with the 
29 "clear values" prior to rendering. For rendering the tiles in subsequent frames, the pixel 
color, z and stencil values are read back from the framebuffer memory 84G. 

Another very likely scenario occurs when the z buffer 210 is cleared and the 
color and stencil buffers 2J0, 210 are loaded into tiles from a pre-rendered image. Thus, as 
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1 a part of the tile preparation, two things happen. The background image is read back from 
the framebuffer memory 84G into the buffers that are not enabled for clear, and the enabled 
buffers (corresponding to the color, z and stencil) are cleared. The pipeline stages 
upstream from the pixel block 84B are aware of these functional capabilities, since they are 

5 responsible for sending the clear information. 

The pixel block 84B compares the z values of the incoming samples to those 
of the existing samples to decide which samples to keep. The pixel block 84B also 
provides the capability to minimize any color bleeding artifacts that may arise from the 

9 splitting of a user frame. 

Data Structures 

- Samples, Pixels, Stamps and Tiles 
1 3 A first data structure is a sample. Each pixel in a VSP is divided into a 

number of samples. Given a pixel divided into an n-by-m grid, a sample corresponds to 
one of the n*m subdivisions. FIG. 4 illustrates the relationship of samples to pixels and 
stamps in one embodiment. 
1 7 The choices of n and m, as well as how many and which subdivisions to 

select as samples are all programmable in the co-processor 840. The grid, sample count 
and sample locations, however, are fixed until changed. Default n, m, count and locations 
are set at reset. FIG. 4 also illustrates the default sample grid, count and locations 
21 according to one embodiment. 

Each sample has a dirty bit, indicating whether either of the sample's color 
or alpha value has changed in the rendering process. 

A next data structure is a stamp. A stamp is a is a j-by-k multi-pixel grid 
25 within an image. In one embodiment, a stamp is a 2x2-pixel area. 

A next data structure is a tile. A tile is an h-by-i multi-stamp area within an 
image. In one embodiment, a tile is an 8x8-stamp area, that is to say, a 16x1 6-pixel area of 
an image. 

29 A next data structure is a packet. A packet is a structure for transferring 

information. Each packet consists of a header followed by packet data. The header 
indicates the type and format of the data that the packet contains. 

Individual packet types as follows are described in detail herein: 

29 



WO 00/1 1 605 PCT/US99/1 9363 

1 Begin_Frame, Prefetch_Begin_Frame, BeginJTile, Prefetch_Begin_Tile, EndJFrame and 
Prefetch_End_Frame, Clear, pixel-mode Cache_Fill, stipple Cache_Fill, VSP, Color and 
Depth. 

5 - The Begin_Frame and Prefetch_Begin_Frame Packets 

Begin_Frame and Prefetch_Begin_Frame packets have the same content . 
except that their headers differ. A BeginJFrame packet signals the beginning of a user 
frame and goes to the pixel block 84B (the VSP queue 240). The Prefetch_Begin_Frame 
9 packet signals the beginning of a frame and is dispatched to the back-end block 84C (the 
back-end block input queue) and pixel out-block prefetch queues. 

For every BeginJFrame packet, there is a corresponding End_Frame packet. 
However, multiple End_Frame packets may correspond to the same user frame. This can 
1 3 happen due to frame splitting on overflow, for example. 

Table 1 illustrates the format in one embodiment of the Begin_Frame and 
Prefetch_Begin_Frame packets. They contain Blockingjnterrupt. Window_X_Offset, 
Window_Y_Offset, Pixel_Format, No_Color_Buffer, No_Z_Bufifer, No_Saved_Z_Buffer, 
1 7 No J3tencil_Buffer, No_Saved_Stencil_Buffer, Stencil_Mode, Depth_Output_Selection, 
Color_Output_Selection, Color_Output_Overflow_Selection and Vertical_Pixel_Count 
fields. A description of the fields follows. 

Software uses the Block_3D_Pipe field to instruct the back-end block 84C 
21 to generate a blocking interrupt. 

The WinSourceL, WinSourceR, WinTargetL and WinTargetR fields 
identify the window IDs of various buffers. The back end 84C uses them for pixel- 
ownership tests. 

25 The Window_X_Offset and Window_Y_Offset are also for the back end 

84C (for positioning the BLTs and such). 

The Pixel_Format field specifies the format of pixels stored in the 

framebuffer 84G. The pixel block 84B uses this for format conversion in the pixel-out unit 
29 280. One embodiment supports 4 pixel formats, namely 32-bits-per-pixel ARGB, 32-bits- 

per-pixel RGB A, 16-bits-per-pixel RGB_5_6_5, and 8-bits-per-pixel indexed color buffer 

formats. 

The SrcEqTarL and SrcEqTarR fields indicate the relationship between the 
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1 source window to be copied as background in the left and right target buffers. The back 

end 84C uses them. 

The No_Color_Buffer flag, if set, indicates that there is no color buffer and, 

thus, disables color buffer operations (such as blending, dithering and logical operations) 
5 and updates. 

The No_Saved_Color_Buffer flag, if set, disables color output to the 
framebuffer 84G. The color values generated in the pixel block 84B are not to be saved in 
the framebuffer because there is no color buffer for this window in the framebuffer 84G. 
9 The No_Z_Buffer, if set, indicates there is no depth buffer and, thus, 

disables all depth-buffer operations and updates. 

The No_Saved_Z_Buffer flag, if set, disables depth output to the 
framebuffer 84G. The depth values generated in the pixel block 84B are not to be saved in 
3 the framebuffer 84G because there is no depth buffer for this window in the framebuffer 
84G. 

The No_Stencil_Buffer flag, if set, indicates there is no stencil buffer and, 
thus, disables all stencil operations and updates. 
7 The No_Saved_Stencil_Bufferfer flag, if set, disables stencil output to the 

framebuffer 84G. The stencil values generated in the pixel block 84B are not to be saved 
in the framebuffer 84G because there is no stencil buffer for this window in the 
framebuffer 84G. 

1 The Stencil_Mode flag, if set, indicates the stencil operations are on a per- 

sample basis (with 2 bits/sample, according to one embodiment) versus a per-pixel basis 
(with 8 bits per pixel, according to that embodiment). 

The pixel block 84B processes depth values on a per-sample basis but 
outputs them on a pixel basis. The Depth_Output_Selection field determines how the pixel 
block 84B chooses the per-pixel depth value from amongst the per-sample depth values. 

In one embodiment, the field values are FIRST, NEAREST and 
FARTHEST. FIRST directs the selection of the depth value of the sample numbered 0 
(that is, the first sample, in a zero-indexed counting schema) as the per-pixel depth value. 
NEAREST directs the selection of the depth value of the sample nearest the viewpoint as 
the per-pixel depth value. Similarly, FARTHEST directs the selection of the depth value 
of the sample farthest from the viewpoint as the per-pixel depth value. 
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1 When a frame overflow has not occurred, the Color_Output_Selection field 

determines the criterion for combining the sample colors into pixels for color output. 
However, when a frame overflow does occur, the Color_Output_Overflow_SeIection field 
determines the criterion for combining the sample colors into pixels for color output. In 

5 one embodiment, the Color_Output_Selection and Color J3utput_Overflow_SeIection state 
parameters have a value of FIRST_SAMPLE, WEIGHTED, DIRTY.SAMPLES or 
MAJORITY. FIRST_SAMPLE directs the selection of the color of the first sample as the 
per-pixel color value. WEIGHTED directs the selection of a weighted average of the 

9 pixel's sample colors as the per-pixel color value. DIRTY_SAMPLES directs the selection 
of the average color of the dirty samples, and MAJORITY directs the selection of (1) the 
average of the samples' source colors for dirty samples or (2) the average of the samples' 
buffer colors for non-dirty samples — whichever of the dirty samples and clean samples 
1 3 groups is the more numerous. 

The Vertical_Pixel_Count field specifies the number of pixels vertically 
across the window. 

The StencilFirst field determines how the sample stencil values are 

1 7 converted to the stencil value of the pixel. If StencilFirst is set, then the Pixel block assigns 
the stencil value of the sample numbered 0 (that is, the first sample, in a zero-indexed 
counting schema) as the per-pixel stencil value. Otherwise, majority rule is used is 
determining how the pixel stencil value gets updated and assigned. 

21 

- The EndJFrame and Prefetch_End_Frame Packets 

End_Frame and Prefetch JEnd_Frame indicate the end of a frame. The 
Prefetch JEnd_Frame packet is sent to the back-end prefetch queue and the End_Frame 
25 packet is placed in the VSP queue 240. 

Table 2 describes the format in one embodiment of the EndJFrame and 
Prefetch_End_Frame packets. (The packet headers values differ, of course, in order to 
distinguish the two types of packets.) They contain a packet header, Interrupt_Number, 
29 Soft_End_Frame, Buffer JDver_Occurred fields. 

The Interrupt_Number is used by the back end 84C. 

The SoftEndFrame and Buffer_Over_Occurred fields each independently 
indicates the splitting of a user frame into multiple frames. Software can cause an end of 
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1 frame without starting a new user frame by asserting Soft_End_Frame. The effect is 
exactly the same as with the BufFer_Over_Occurred field, which is set when the mode- 
extraction unit 843 overflows a memory 84D, 84E. 

5 - The Begin_Tile and Prefetch_Begin JTile Packets 

Begin_Tile and Prefetch_Begin_Tile packets indicate the end of the 
previous tile, if any, and the beginning of a new tile. Each pass through a tile begins with a 
BeginJTile packet. The sort block 844 outputs this packet type for every tile in a window 
9 that has some activity. 

Table 5 describes the format, in one embodiment, of the BeginJTile and 
Prefetch_Begin_Tile packets. (The packet header values differ, of course, in order to 
distinguish the two types of packets.) Theycontain First JTile_InJFrame, BreakpointJTile, 
1 3 Begin_SuperTile, Tile_Right, Tile_Front, Tile_Repeat, Tile_Begin_SubFrame and 
Write_Tile_ZS flags, as well as Tile_XJLocation and Tile_Y_Location fields. The 
BeginJTile and Prefetch_Begin_Tile packets also contain Clear_Color_Value, 
Clear_Depth_Value, Clear_Stencil_Value, Backend_Clear_Color, Backend_Clear_Depth, 
1 7 Backend_Clear_Stencil and Overflow_Frame fields. A description of the fields follows. 

The First_Tile_In_Frame flag indicates that the sort block 844 is sending 
the data for the first tile in the frame. (Performance counters for the frame can be 
initialized at this time.) If this tile has multiple passes, the First JTile_In_Frame flag is 
2 1 asserted only in the first pass. 

BreakpointJTile indicates the breakpoint mechanism for the pipeline 840 is 

activated. 

Begin_SuperTile indicates that the sort block 844 is sending the data for the 
25 first tile in a super-tile quad. (Performance counters related to the super-tile can be 
initialized at this time.) 

(T he pixel block 84B does not use the Tile_Right, Tile_Front, Tile_Repeat, 
Tile_Begin_SubFrame and Write_Tile_ZS flags.) 
29 Tile_X_Location and Tile_YJLocation specify the starting x and y 

locations, respectively, of the tile within the window. These parameters are specified as 
tile counts. 

Clear_ColorJValue, ClearJDepth_VaIue and Clear Stencil Value specify 
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1 the values the draw, z- and stencil buffer pixel samples receive on a respective clear 

operation. The Backend_Clear_Color, Backend_Clear_Depth and Backend_Clear_Stencil 
flags indicate whether the back-end block 84C is to clear the respective draw, z- and/or 
stencil buffers. When a flag is TRUE, the back end 84C does not read the respective 

5 information from the framebuffer 84G. The pixel block 84B actually performs the clear 
operation. 

Backend_Clear_Color indicates whether the pixel-out unit 280 is to clear 
the draw buffer. If this flag is set, the back end 84C does not read in the color buffer 
9 values. Instead, the pixel-out unit 280 clears the color tile to Clear_Color_Value. 

Conversely, if the flag is not set, the back-end block 84C reads in the color buffer values. 

The Backend_Clear_Depth field indicates whether the pixel-out unit 280 is 
to clear the z buffer. The pixel-out unit 280 initializes each pixel sample on the tile to the 
1 3 Depth_Clear_Value before the pixel block 84B processes any geometry. If this bit is not 
set, the back-end block 84C reads in the z values from the framebuffer memory. 

The Backend_ClearjStencil field indicates the stencil-buffer bits that the 
pixel-out unit 280 is to clear. The back-end block 84C reads the stencil values from the 
1 7 framebuffer memory of this flag is not set. The pixel-out unit 280 clears the stencil pixel 
buffer to the Clear_Stencil_Value. 

The Overflow_Frame flag indicates whether this tile is a result of an 
overflow in the mode-extraction block 843, that is to say, whether the current frame is a 
2 1 continuation of the same user frame as the last frame. If this bit is set, 

Color_Output_Overflow_Selection determines how the pixel-color value is output. If the 
flag is not set, Color_Output_Selection determines how the pixel-color value is output. 

Tile_Begin_SubFrame is used to split the data within the tile into multiple 
25 sub- frames. The data within each sub-frame may be iteratively processed by the pipeline 
for sorted transparency, anti-aliasing, or other multi-pass rendering operations. 

- The Clear Packet 

29 The Clear packet indicates that the pixel block 84B needs to clear a tile. 

This packet goes to the VSP queue 240. 

Table 4 illustrates the format in one embodiment of a Clear packet. It 
contains Header, Mode_Cache_Index, Clear_Color, Clear_Depth, ClearJJtencil, 

34 



WO 00/11605 



PCT/US99/19363 



1 Clear_Color_Value, Clear_Depth_Value and Clear_Stencil_Value fields. 

Clear_Color indicates whether the pixel block 84B is to clear the color 

buffer, setting all values to Clear_Color_Value or Clear_Index_Value, depending on 

whether the window is in indexed color mode. 
5 Clear_Depth and Clear _StenciI indicate whether the pixel block 84B is to 

clear the depth and/or stencil buffer, setting values to Clear_Depth_Value and/or 

Clear_Stencil_Value, respectively. 

9 - 1 The Pixel-Mode Cache_FHI Packet 

A pixel-mode Cache_Fill packet contains the state information that may 
change on a per-object basis. While all the fields of an object-mode Cache_Fill packet will 
seldom change with every object, any one of them can change depending on the object 

13 being rendered. 

Tables 6 and 7 illustrate the format and content in one embodiment of a 
pixel-mode Cache_Fill packet. The packet contains Header, Mode_Cache_Index, 
Scissor_Test_Enabled, x Scissor Min , x Sci$$or Max , y Sc issor_Min» yscissor.Max, Stipple_Test_Enabled, 

17 Function ALPHA , alpha^p^j^^^, AIpha_Test_Enabled, Functionc OL oR, color^, color,^, 
Color_Test_Enabled, stencilREPERENCE, Functions-n^L, Function OEPTH , masks^^, 
Stencil_Test_FaiIure_, Operation, Stencil_Test_Pass_Z_Test_Failure_Operation, 
Stencil_and_Z_Tests_Pass_Operation, Stencil_Test_Enabled, write_mask STENCIL , 

21 Z_Test_Enabled, Z_Write_Enabled, DrawStencil, writejnaskcoLOR, Blending_Enabled, 
Constant_Color BLEND , Source_Color_Factor, Destination_Color_Factor ) 
Source_Alpha_Factor, Destination_Alpha_Factor, Color_LogicBlend_Operation, 
Alpha_LogicBlend_Operation and Dithering_Enabled fields. A description of the fields 

25 follows. 

Mode_Cache_Index indicates the index of the entry in the mode cache 270 
this packet's contents are to replace. 

Scissor_Test_EnabIed, Stipple_Test_Enabled, Alpha_Test_Enabled, 
29 Color_Test_Enabled, StenciI_Test_Enable and Z_Test_Enabled are the respective enable 
flags for the scissor, stipple, alpha, color, stencil and depth tests. DitheringJEnabled 
enables the dithering function. 

^Scissor Mm, ^Scissor Max.yscissor Min and ysciaor Mu specify the left, right, top and 
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1 bottom edges, respectively, of the rectangular region of the scissor test. 

FunctionALPHA, Function^LCR, FunctionsxENcu. and Function DEPTH indicate the 
respective functions for the alpha, color, stencil and depth tests. 

alphaREFEREMCE is the reference alpha value used in alpha test. 
5 color MIN and colori^ are, respectively, the minimum inclusive and 

maximum inclusive values for the color key. 

stencil reference is the reference value used in The stencil test. 
mask<jTENciL IS the stencil mask to AND the reference and buffer sample 
9 stencil values prior to testing. 

Stencil JTest_Failure_Operation indicates the action to take on failure of the 
stencil test. Likewise, Stencil_Test_Pass_Z_Test_Failure_Operation indicates the action to 
take on passage of the stencil test and failure of the depth test and 
13 Stencil_and_Z_TestsJPass_Operation the action to take on passage of both the stencil and 
depth tests. 

The writejmasksTENdL field is the stencil mask for the stencil bits in the 
buffer that are updated. 
1 7 Z_Write_Enabled is a Boolean value indicating whether writing and 

updating of the depth buffer is enabled. 

The DrawStencil field indicates that the pixel block 84B is to interpret the 
second data value from the Phong block 84A as stencil data. 
21 write^maskcoLOR is the mask of bitplanes in the draw buffer that are enabled. 

In color-index mode, the low-order 8 bits are the IndexMask. 

Blending^Enabled indicates whether blending is enabled. If blending is 
enabled , then logical operations are disabled. 
25 Constant_Color BLEND is the constant color for blending. 

The Source_Color_Factor and Destination_Color_Factor fields are, 
respectively, the multipliers for source-derived and destination-derived sample colors. 
Source_Alpha_Factor is the multiplier for sample alpha values, while 
29 Destination_Alpha_Factor is a multiplier for sample alpha values already in the tile buffer. 

The Color_LogicBlend_Operation indicates the logic or blend operation for 
color values, and Alpha_LogicBlend_Operation indicates the logic or blend operation for 
alpha values. 
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- The Stipple Cache_Fill Packet 

An next data structure is the stipple Cache_Fill packet. 
Table 10 illustrates the structure and content of a stipple Cache_Fill packet 
5 according to one embodiment. The packet contains Stipple_Cache_Index and 

Stipple_Pattern fields. The Stipple_Cache_Index field indicates which of the stipple 
cache's entries to replace. The Stipple_Pattern field holds the stipple pattern. 

In one embodiment, the stipple cache 260 has four entries, and thus the bit- 
9 size of the Stipple_CacheJndex is 2. (OpenGL sets the size of a stipple pattern to 1024 
bits.) 

-The VSP Packet 

13 Each visible stamp in a primitive has a corresponding VSP packet. Table 3 

describes the format of a VSP packet according to one embodiment. It contains 
Mode_Cache Jndex, Stipple_Cache_Index, Stamp_XJndex, Stamp_Y Jndex, 
Sample_Coverage_Mask, Z^r^ce, DzDx, DzDy and IsJlultiSample fields, a reference 

1 7 z value, Zreference, and two depth slopes, dzJd\ and dz/dy . A VSP also contains an 
Is_MultiSample flag. A description of the fields follows. 

A VSP packet contains indices for the mode and stipple cache entries in the 
mode and stipple caches 270, 260 that are currently active: Mode J^ache Jndex and 

2 1 Stipple J^ache Jndex. (The Phong block 84A separately supplies the color data for the 
VSP.) 

In one embodiment, the stipple cache 270 has four entries, and thus the bit- 
size of the Stipple_Cache Jndex field is two. The mode cache 260 has sixteen entries, and 
25 the bit-size of the Mode JTache Jndex field is four. 

A VSP packet also contains Stamp_X_Index, Stamp_Y Jndex and 
Is MultiSample values. The Stamp_X_Index indicates the x index within a tile, while the 
Stamp_Y Jndex indicates the y index within the tile. The IsJdultiSample flag indicates 
29 whether the rendering is anti-aliased or non anti-aliased. This allows programmatic control 
for primitive based anti-aliasing. 

In one embodiment, sixty-four stamps compose a(n 8x8-stamp) tile. The bit 
sizes of the Stamp_X Jndex and Stamp_Y Jndex are thus three. With 16xl6-pixel tiles 
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1 and 2x2-pixel stamps, for example, the stamp indices range from 0 to 7. 

A VSP packet also contains the sample coverage mask for a VSP, 
Sample_Coverage_Mask. Each sample in a stamp has a corresponding bit in a coverage 
mask. All visible samples have their bits set in the Sample_Coverage_Mask. 
5 In one embodiment, sixteen samples compose a stamp, and thus the bit size 

of the Sample_Coverage_Mask is sixteen. 

The z value of all samples in a stamp are computed with respect to the 
Zreference value, DzDx and DzDy. 
9 In one embodiment, the Zreference value is a signed fixed point value with 

28 integer and 3 fractional bits (s28.3), and DzDx and DzDy are signed fixed point (s27) 
values. These bit precisions are adequate for resulting 24-bits-per-sample depth values. 

The Is_MultiSample flag indicates if the rendering is antialiased or non- 
13 antialiased. This field allows primitive-based anti-aliasing. 

Zreference, DzDx and DzDy values are passed on to the mode-injection 
block 847 from the cull block 846. The mode-injection block 847 sends these down to the 
pixel block 84B. The Pixel Depth packets arriving from the Phong block 84A are written 
1 7 into the color queue 230. 

- Color Packet 

A Color packet gives the color values (that is to say, RGB A values) for a 
21 visible pixel in a stamp. 

Table 8 illustrates the form and content of a Color packet according to one 
embodiment. Such a packet includes a Header and a Color field. In one embodiment, a 
color value has 32 bits distributed evenly over the red, green, blue and alpha values. 

25 

- Depth/Stencil Information 

A Depth packet conveys per-pixel depth or stencil information. Table 9 
illustrates the form and content of a Depth packet according to one embodiment. Such a 
29 packet contains Header and Z fields. In one embodiment, the Z field is a 24-bit value 
interpreted as fragment stencil or fragment depth, depending on the setting of the 
DrawStencil flag in the applicable pixel mode. 
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1 

- State Parameters 

The pixel block 84B maintains a number of state parameters that affect its 
operation. Tables 22 and 23 list the state parameters according to one embodiment. These 
5 state parameters correspond to their like-named packet fields. As such, the packet-field 
descriptions apply to the state parameters, and a repetition of the descriptions is omitted. 

The exceptions are SampleLocations, SampleWeights, and EnableFlags. 
SampleLocations are the locations of the samples in the pixel specified on the 16x16 sub- 
9 pixel grid. Sample Weights are the fractional weights assigned to the samples. These 
weights are used in resolving the sample colors into pixel colors. An alternate embodiment 
could include these fields in some of the state packets (such as BeginFrame or BeginTile 
packet) to allow dynamic update of these parameters under software control for 
13 synchronous update with other processing. 

The EnableJFIags include the Alpha_Test_Enabled, ColorJTest_Enabled, 
StencilJTest_Enabled, ZJTestJEnabled, Scissor_Test_Enabled, Stipple_Test_Enabled, 
Blending_Enabled and Dithering_Enabled Boolean values. 

17 

Protocols 

The mode-injection and Phong blocks 847, 84A send input to the pixel 
block 84B by writing packets into its input queues 210. The pixel block 84B also 
21 communicates with the back-end block 84C, sending completed pixels to the framebuffer 
84G and reading pixels back from the framebuffer 84G to blend with incoming fragments. 
(The pixel block 84B sends and receives a tile's worth of pixels at a time.) 

The functional units within the pixel block 84B are described below. As 
25 color, alpha and stipple values are per-fragment data, the results of corresponding tests 
apply to all samples in the fragment. The same is true of the scissor test as well. 

The pseudo-code for the data flow for one embodiment based on the per- 
fragment and per-sample computations is outlined below. This pseudo-code provides an 
29 overview of the operations of the pixel block 84B. The pseudo-code includes specific 
assumptions such as the size of the sub-pixel grid, number of samples etc.. These and 
other fixed parameters are implementation dependent. 
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1 

DoPixel { ) { 

for each stamp { 

for each pixel in the stamp { 

5 

/* compute sample mask for pixel */ 
roaskpixEL = mask SAKPLE & OxF; 
roasksAHpts >>« 4; 

9 

if (mask PIXEL == 0) 

/* none of the samples is set */ 
break ; 

13 else if (Scissor_Test_Enabled && ( !Passes_Scissor_Test () ) ) 

break; 

else if (Stipple_Test_Enabled && ( ! Passes_Stipple_Test ( ) ) ) 
break ; 

17 else if (Alpha_Test_Enabled && ( ! Passes_Alpha_Test () ) ) 

break; 

else if (Color_Test_Enabled && { ! Passes_Color_Test ( ) ) ) 
break; 

21 else if (Stencil_Test_Enabled !No_Stencil_Buf f er) { 

if (Stencil_Mode) { 

/* per-pixel stencil */ 
if ( ! Passes_Pixel_Stencil_Test ( ) ) { 
25 doPixel_Stencil_Test_Failed_Operation() ; 

break; 
}else { 

Passes_Pixel_Z_Test () ; 

29 } 

} else { 

/* per-sample stencil */ 

for each sample in the pixel { 
33 Is_Valid_Sample = maskpjxm, & 0x1; 

niaskpjxsL >>- 1; 

if <Is_Valid_Sample) { 

if ( ! Passes_Sample_Stencil_Test ( ) ) { 
37 doSample_Stencil_Test_Failed_Operation() ; 

break; 

} else if (Z_Test_Enabled 

&& ( ! Passes_Sample_Z_Test ( ) ) ) { 

41 

doSampleStencil_Test_Passed_Z_Test_Failed_Operation ( ) ; 

} else { 

doSampleStencil_and_Z_Tests_Passe 

45 d_Operation() ; 

} 

} 

} /* for each sample in pixel */ 

49 } 

} else { 

/* if ( !Stencil_Test_Enabled || No_Stencil_Buf f er) */ 
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1 doPixelDepthTest ( ) ; 

} 

} /* for each pixel in stamp */ 

} /* for each stamp */ 

5 } /* DoPixelO */ 



doPixelDepthTest ( ) { 
9 boolean Is_First_Pass, Is_First_Fail; 



13 



z_Pass_Count = z_Fail_Count = s amp 1 e_numb e r = 0; 
Is_First_Pass = Is First_Failure = FALSE; 



for each sample { 

Is_Valid_Sample = mask PIXBL & 0x1; 
mask PIXEL >> 1; 
17 sample_number++ ; 

if (Is_Valid_Sample) { 

if (Z_Test_Enabled && !No_Z_Buf f er) { 
if (doSampleDepthTest () ) { 
21 doBlendEtcO; 

Z_Pas s_Count + + ; 
if ( s amp 1 e_numb e r == 1) 
Is_First_Pass = TRUE; 

25 }else { 

Z_Fail_Count++ ; 

if (sample_number == 1) 

Is_First_Failure = TRUE; . 

29 } 

} else { 

doBlendEtcO ; 
Z_Pass_Count++ ; 
33 if ( s amp 1 e__numb e r « 1) 

Is_First_Pass = TRUE; 

} 

} 

37 } 

if (StencilJTest_Enabled && !No_Stencil_Buf f er) { 
if (StencilFirst » 1) { 
if (Is_First_Pass) 
41 doPixelStencil_and_Z_Tests_Passed_Operation() ; 

else if (Is_First_Failure) 

doPixelStencil_Test_Passed_Z_Test_Failed_Operation( ) ; 
} else { 

45 if (z_Pass_Count >= z_Fail_Count) 

doPixelStencil_and_Z_Tests_Passed_Operation ( ) ; 

else 

doPixelStencil_Test_Passed_Z_Test_Failed_Operation ( ) ; 

49 } 

} /* DoPixelDeptTest () */ 
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1 boolean doSampleDepthTest ( ) { 
if <<!No_Z_Buffer) { 
doComputeDepth ( ) ; 
if (IdepthTest) 

5 /* Compare z values according to depthFunc */ 

return FALSE; 
else{ 

set Z_Visible bit; 
9 updateDepthBuf f er ( ) ; 

doBlendEtc () ; 

return TRUE; 

} 

13 } else 

return TRUE; 

} 



17 



doComputeDepth (index P1XEL/ index^^) { //pixel and sample number 
are known 



21 /* sub-pixel units per pixel in the X axis in one embodiment */ 
#define SUBPIXELS_PER_PIXEL_IN_X 16 

/* bits to represent SUBPIXELSJPER_PIXEL_IN_X 
25 tfdefine SUBPIXEL_BIT_COUNT x log 2 (SUBPIXELS_PER_PIXEL_IN_X) 

/* pixels per stamp in the X axis in one embodiment */ 
tdefine PIXELS_PER_STAMP_IN_X 2 

29 

/* bits to represent PIXELS_PER_STAMP_IN_X */ 

#define PIXEL_BIT_COUNT x log 2 (PIXELS_PER_STAMP_IN_X) 

33 #define SUBPIXELS_PER_PIXEL_IN_Y 16 

tdefine SUBPIXEL_BIT_COUNTy log 2 (SUBPIXELS_PER_PIXEL_IN_Y) 

#define PIXELS_PER_STAMP_IN_Y 2 

#define PIXEL_BIT_COUNT y log a (PIXELS_PER_STAMP_IN_Y) 

37 

/* lower left of the pixel in sub-pixel units */ 
index x = (index PIXBL & PIXEL_BIT_COUNT x ) << SUBPIXELJ3IT_COUNT x ; 
index* = ( (index PIJCElj »PIXEL_BIT_COUNT x ) & PIXEL_BIT_COUNT y ) 
41 « SUBPIXEL_BIT_COUNT y ; 

if ( !Is_MultiSample) { 

/* in aliased mode, the sample position is at the center 
of the pixel */ 
45 /* account for Z RBrBaElJCE at the center of stamp */ 

dx = index x - 8; 
dy = index Y - 8; 
} else { 

49 dx = index x + sampleXlindexg^p^] - 16; 

dy = index Y + sampleYtindeXg^uJ - 16; 

} 

Zsahplb = Zrefbrbwce + dZdX * dx + dZdY * dy; 



42 



WO 00/1 1 605 PCT/US99/1 9363 



1 ) 



- Input Queuing and Filtering 

5 The mode-injection and Phong blocks 847 and 84A place the data packets in 

the input FIFOs 210. The data from the Phong block 84A is placed in the fragment color 
queue 230. For the input packets received from the mode-injection block 847, the input 
filter 220 looks at the packet header and determines whether the packet is to be passed 
9 through to the back-end block 84C, placed in the VSP queue 240, sent to the pixel-out unit 
280 or some combination of the three. The pipeline may stall if a packet (bound for the 
back-end block 84C, VSP queue 240, color queue 230 or the pixel-out input queue) can not 
be delivered due to insufficient room in the destination queue. 

13 In one embodiment, the VSP queue 240 and the color queue 230 are a 

series of fixed size records (150 records of 128 bits each for the VSP queue 240 and 128 
records of 34 bits each for the color queue 230). The packets received occupy integer 
number of records. The number of records a packet occupies in a queue depends on its 

1 7 type and, thus, its size. 

The pixel block 84B maintains a write pointer and a read pointer for each 
queue 230, 240 and writes packets bound for a queue into the queue, starting at the record 
indexed by the write pointer. The pixel block 84B appropriately increments the write 

21 pointer, depending on the number of records the packet occupies and accounting for 
circular queues. If after incrementing a queue write pointer, the pixel block 84B 
determines that the value held by the write pointer equals that held by the read pointer, it 
sets the queue's status to "full." 

25 The block 84B retrieves packets from the record indexed by the read pointer 

and appropriately increments the read pointer, based on the packet type and accounting for 
circular queues. If after incrementing a queue's read pointer, the pixel block 84B 
determines the value held by the read pointer equals that held by the write pointer, it sets 

29 the input queue's status to "empty." 

Subsequent read and write operations on a queue reset the full and empty 
status bits appropriately. 
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- Input Processing 

The pixel block input processor 290 retrieves packets from the VSP and 
color queues 240 and 230. The input processor 290 stalls if a queue is empty. All packets 
5 are processed in the order received. (The VSP queue 240 does not hold only VSP packets 
but other input packets from the mode-injection block 847 as well — Begin_Tile, 
Begin_Frame and pixel-mode Stipple packets, for example.) 

Before processing a VSP record from the queue 240, the input processor 
9 290 checks to see if it can read the fragment colors (and/or depth/stencil data) 

corresponding to the VSP record from the color queue 230. If the queue 230 has not yet 
received the data from the Phong block 847, the input processor 290 stalls until it can read 
all the color fragments for the VSP record. 
13 Once the required data from the Phong block 84 A is received, the input 

processor 290 starts processing the records in the input queue 240 in order. For each VSP 
record, it retrieves the color and mode information as needed and passes it on to the pixel 
pipeline 2M0. If the input processor 290 encounters a pixel-mode or stipple Cache_Fill 
17 packet, it uses the cache index supplied with the packet to copy it into the appropriate 
cache entry. 

- Scissor Test 

21 The scissor-test unit 2 AO performs the scissor test, the elimination of pixel 

fragments that fall outside a specified rectangular area. The scissor rectangle is specified in 
window coordinates with pixel (rather than sub-pixel) resolution. The scissor-test unit 2 AO 
uses the tile and stamp locations forwarded by the input processor 290 to determine if a 

25 fragment is outside the scissor window. The pseudo-code of the logic is given below: 



boolean Is_valid_Fragment ; 
boolean Passes_Scissor_Test () { 
29 if (Scissor_Test_Enabled) { 

Xwibjdow = Tile_X_Location + 2 * Stamp_X_Index 

+ indexpxxsL & 0x1; 
Y*imo» = Tile_Y_Location + 2 * Stamp_Y_Index 
33 + (index PIXEL » 1) & 0x1; 

Is_Valid_Fragment = (x^^ >= Xscxsmmm) 

(XkIKDO* =< Xs CISS0 R_HWc) 
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1 (YwINDOW > " ySCISSOR_«IN ) 

(yHrHDOW =< YSCISS0R_HAX ) i 

return Is_Valid_Fragment; 
} else { 
5 return TRUE; 

} 

} 



9 where x sass0RMAX , x SCISS0R MIN , y S cissoRMAx YsassoR.MiN are *e maximum and 
minimum x values and the maximum and minimum y values for valid pixels. 

The pixel block 84B discards the fragment if Is_Valid_Fragment is false. 
Otherwise it passes the fragment on to the next stage of the pipeline. The scissor-test unit 
1 3 2 AO also sends the (xwtndow* ywiNDow) window coordinates to the stipple-test unit 2B0. 
This test is done on a per-pixel basis. 



- Stipple Test 

1 7 The stipple-test unit 2B0 performs the stipple test if the 

StippIe_TestJEnabled flag is set (that is to say, is TRUE). Otherwise, the unit 2B0 passes 
the fragment on to the next stage of the pipeline. 

The stipple-test unit 2B0 uses the following logic: 

21 

boolean Is_Valid_Fragment ; 
boolean Passes_Stipple_Test () { 
if (Stipple_Test_Enabled) { 
25 /* OpenGL uses 32x32 stipple patterns 

with each bit representing a pixel.*/ 
stipple_X_index « (x^m^ & OxlF) ; 
stipple_Y_index = (y WIMD ow & OxlF) ; 
29 Is_Valid_Fragment = stipple [stipple_Y_index, 

stipple_X_index] == 1; 

return Is_Valid_Fragment ; 
} else { 
33 return TRUE; 

} 

} 

37 The stipple-test unit uses the coordinates (stipple_X_index, 

stipple_Y_index) to retrieve the stipple bit for the given pixel. If the stipple bit at 
(stipple_X_index, stipple_Y_index) is not set (that is to say, is FALSE), the stipple test 
fails, and the pixel block 84B discards the fragment. 
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1 The stipple test is a per-fragment operation. 

- Alpha Test 

The alpha-test unit 2C0 keeps or discards an incoming fragment based on its 
5 alpha values. The unit 2C0 tests the opacity of the fragment with respect to a reference 
value, alphaHcfocnce, according to a specified alpha test function, Functioning (Table 1 1 
shows the values for Functioni^ and the associated comparisons according to one 
embodiment.) If the fragment fails, the alpha-test unit 2C0 discards it. If it passes, the unit 
9 2C0 sends it on to the next stage in the pipeline. 

The alpha-test unit 2B0 uses the following logic: 



boolean Passes_Alpha_Test ( ) { 
13 if <Alpha_Test_Enabled) { 

case (Function^pHx) { 





switch 


NEVER: 


return 


FALSE ; 




switch 


LESS: 


return 


A < alpha Refcrence ; 


17 


switch 


EQUAL: 


return 


A == alpha Re£erence ; 




switch 


LEQUAL : 


return 


A <= alpha Re ference»* 




switch 


GREATER: 


return 


A > alpha Re£erence ; 




switch 


NEQUAL: 


return 


A 1= alpha Reference ; 


21 


switch 


GEQUAL : 


return 


A >= alpha Re£er ence; 




otherwise: 


return 


TRUE; 



} else { 
25 return TRUE ; 

} 

} 

29 The alpha test is enabled if the Alpha_Test_Enabled flag is set. If the alpha 

test is disabled, all fragments are passed through. This test applies in RGBA-color mode 
only. It is bypassed in color-index mode. 

Alpha test is a per-fragment operation. 

33 

- Color Test 

Unlike the alpha-test unit and its single reference- value test, the color-test 
unit 2D0 compares a fragment's RGB value with a range of color values via the keys 
37 color MIN and colorwAx. (The color keys are inclusive of the minimum and maximum 

values.) If the fragment fails the color test, the unit 2D0 discards it. Otherwise, the unit 
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1 2D0 passes it down to the next stage in the pipeline. 

The color-test unit 2B0 uses the following logic: 



13 



17 



boolean Passes_Color_Test () { 
if (ColorJTest_Enabled) { 

switch (Functioned) { 



case 


NEVER: 


return 


FALSE ; 


case 


LESS: 


return 


C < color,,™; 


case 


EQUAL: 


return 


(C >= color MIH ) 








& (C 


case 


LEQUAL : 


return 


C <= COlormm; 


case 


GREATER : 


return 


C > color^; 


case 


NEQUAL : 


return 


(C < color„ IK ) 








1 (c 


case 


GEQUAL : 


return 


C >= color MIH ; 


otherwise : 


return 


TRUE; 



21 } 



} 

} else { 

return TRUE; 

} 



Table 12 shows the values for FunctioncoLOR and the associated comparisons 
according to one embodiment. FunctioncoLOR is implemented such that the minimum and 
25 maximum inclusiveness in the color keys is accounted for appropriately. 

The color test is bypassed if the Color_Test_Enabled flag is not set. 
The color test is applied in RGBA mode only. In the color-index mode, it is 
bypassed. The color-test unit 2D0 applies the color test to each of the R, G and B channels 
29 separately. The test results for all the channels are logically ANDed. That is to say, the 
fragment passes the color test passes only if it passes for every one of the channels. 
The color test is a per-fragment operation. 



33 - Stencil/Z Test 

While the alpha and color tests operate only on fragments passing through 

the pipeline stages, the stencil test uses the stencil buffer 210 to operate on a sample or a 

fragment. The stencil-test unit 2E0 compares the reference stencil value, stencil RcfCTWlce , with 
37 what is already in the stencil buffer 210 at that location. The unit 2E0 bitwise ANDs both 

the stenciljfcfe^ce and the stencil buffer values with the stencil mask, mask STENCIL , before 

invoking the comparison specified by FunctionsjENdL. 
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1 In one embodiment, the Functions-r^c^ state parameter specifies 

comparisons parallel to those of Function^^ and FunctioncoLOR. 

If the stencil test fails, the sample is discarded and the stored stencil value is 
modified according to the Stencil_Test_Failed_Operation state parameter. 
5 If the stencil test passes, the sample is subjected to a depth test. If the depth 

test fails, the stored stencil value is modified according to the 
Stencil_Test_Passed_Z_Test__Failed_Operation state parameter. 

If both the stencil and depth tests pass, the stored stencil value is modified 
9 according to the Stencil_and_Z_Tests_Passed_Operation state parameter. 

Table 13 shows the values for the Stencil_Test_Failed_Operation, 
Stencil_Test_Passed_Z_Test_Failed_Operation and 

Stencil_and_Z_Tests_Passed_Operation state parameters and their associated functions 
1 3 according to one embodiment. 

The unit 2E0 masks the stencil bits with the write_mask STENCIL state 

parameter before writing them into the sample tile buffers. The major difference between 

pixel and sample stencil operations lies in how the stencil value is retrieved from and 
1 7 written into the tile buffer. The write_mask STENCIL state parameter differs from mask SXENClL 

in that mask STENOL affects the stencil values used in the stencil test, whereas 

write_mask STENCIL affects the bitplanes to be updated. 

Considering the overview pseudo-code given above, the following pseudo- 
2 1 code further describes the logic of the stencil-test unit 2E0: 



25 



boolean Passes_Stencil_Test 0 { 
boolean Is Valid; 



i f ( No_S t enc i 1 J3u f f er ) { 

return TRUE; 
} else if (Stencil_Test_Enabled) { 
29 Set_Stencil_Buf f er_Pointer (pointer) ; 

source = (^pointer) & mask sxam:iL ; 
reference = s t enc i 1 reference & masksT^^; 
switch (FunctionsTBMdJ { 
33 case NEVER: Is_Valid = FALSE; 

break; 

case LESS: Is_Valid = source < reference; 

37 break; 
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case EQUAL: Is_Valid = (source == reference) ; 

break ; 



case LEQUAL: 



Is_Valid = source <= reference; 
break; 



case GREATER: 



Is_Valid = source > reference; 
break; 



13 



case NEQUAL: 



case GEQUAL: 



Is_Valid = (source < reference) 

| (source > reference) ; 

break ; 

Is_Valid = source reference; 
break; 



17 



case ALWAYS: 
otherwise : 



Is Valid = TRUE; 



21 



25 



29 



} 



} 

return (Is Valid); 



} else 



return TRUE; 



doStencil_Test_Failed_Operation ( ) { 

switch (Stencil_Test_Failed_Operation) { 
case ZERO: value = 0; 

break; 



33 



case MAX_VALUE: value « (Stencil_Mode ? 255 : 3) ; 
break; 



case REPLACE: 



value = stencil Reference ; 
break ; 



37 



case INCR: 



value = ( *pointer ) ++ ; 
break; 



41 



case DECR: 



value « (*pointer) ; 
break; 



45 



case INCRSAT: 



if ((value = (*pointer) ++) > 

(Stencil_Mode ? 255 : 3) ) { 
value = (Stencil_Mode ? 255 : 3); 

} 

break; 



49 



case DECRSAT: 



if ((value = (*pointer)-- ) < 0) { 

value = 0; 
break; 
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case INVERT: value = -(^pointer); 

break ; 



case KEEP: 
otherwise: 



} 



value = *pointer; 



13 } 



if ( !No_Saved_Stencil_Buffer) { 
/* write stencil tile */ 
♦pointer = value & write^askcrnajdL,- 

} 



doStencil_Test_Passed_ZJTest_Failed_Operation ( ) { 

switch (Stencil_Test_Passed_Z_Test_Failed_Operation) { 
17 /* same logic as the switch(){) in 

Stencil_Test_Passed_Operation { ) */ 

} 

21 if ( !No_Save_Stencil_Buffer) { 

/* write stencil tile */ 
♦pointer = value & write^asksrcNcn,; 

} 

25 } 

doStencil_and_Z_Tests_Passed_Operation ( ) { 

switch (Stencil_and_Z_Tests_Passed_Operation) { 
29 /* same logic as the switch(){} in 

Stencil_Test_Passed_Operation O */ 

} 

33 if ( !No_Save_Stencil_Buffer) { 

/* write stencil tile */ 
♦pointer = value & write_mask STEBCtL ; 



37 } 



} 



The state parameter Stencil_Mode from a Begin_Frame packet specifies 
whether the stencil test and save are per-pixel or per-sample operations and, thus, specifies 
41 the number of bits involved in the operations (in one embodiment, 2 or 8 bits). 

When Stencil_Mode is TRUE, the stencil operations are per pixel, but the 
depth testing is per sample. For a given pixel, some of the samples may pass the depth test 
and some may fail the depth test. In such cases, the state parameter StencilFirst from 
45 BeginFrame packet determines which of the stencil update operations is carried out. If 
StencilFirst is TRUE, then depth-test result for the first sample in the pixel determines 
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1 which of the Stencil_and_Z_Tests_Passed_Operation and 

StencilJTest_Passed_ZJTest_Failed_Operation is invoked. Otherwise majority rule is used 
to decide the update operation. The overview pseudo-code for pixel-block data flow 
outlines the interaction between the stencil- and the depth-testing operations. 

5 The stencil test is enabled with the Stencil_Test_Enabled flag. The 

No_StenciI_Buffer flag passed down with the Begin_Frame packet also affects the behavior 
of the test. Table 16 shows the actions of the stencil-test unit 2E0 based on the settings of 
Stencil_Test_Enabled, No_Stencil_Buffer and No_Saved_Stencil_Buffer flags. As Table 

9 1 6 shows, the No_Stencil_Buffer flag overrides other stencil-related rendering state 
parameters. 

The stencil test can be performed on a per-fragment or per-pixel basis. 

13 — DrawStencil Functionality 

Under certain circumstances, the pixel block 84B may receive a per-pixel 
stencil value from the Phong block 84A. The pixel block 84B treats this per-pixel stencil 
value in a manner similar to the stencil reference value, stencilRcfcrence. If the Stencil_Mode 
1 7 state parameter specifies per-sample operations, the pixel block unit 84B uses the stencil 
value from the Phong block 84A for all samples of the fragment. 

For example, if an application 8211 seeks to copy pixel rectangle into the 
stencil buffer and per-sample operations are 8-bit operations, the stencil state parameters are 
21 set as follows: 



DrawStencil TRUE 

Stencil_Test_Enabled TRUE 

25 Function STENCIL ALWAYS 

masksjENciL Oxff 

write_mask STENCIL Oxff 

Stencil_Test_Failed_Operation REPLACE 

29 Stencil_Test_Passe<LZ_Test_Failed_Operation REPLACE 

Stencil_andJZ_Tests_Passed_Operation REPLACE 

No_Stencil_Buffer FALSE 

No_Saved_Stencil_Buffer FALSE 
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- Depth Test 

The depth buffer-test unit 2E0 compares a sample's z value with that stored 
5 in the z-buffer 210 and discards the sample if the depth comparison fails. 

If the depth test passes and Z_Write_Enabled is TRUE, the depth-test unit • 
2E0 assigns the buffer at the sample's location the sample Z value clamped to the range [0, 
2 z.value_bit_coukt _j j ^ one embodiment, Z values are 24-bit values, and thus 
9 Z_VALUE_BIT_COUNT is set to 24.) The unit 2E0 updates the stencil buffer value 
according to the Stencil_and_ZJTests_Passed_Operation state parameter. The unit 2E0 
passes the sample on to the blend unit. 

If the depth test fails, the unit 2E0 discards the fragment and updates the 
13 stencil value at the sample's location according to the 

Stencil_Test_Passed_Z_Test_Failed_Operation state parameter. 

Considering the overview pseudo-code given above, the following pseudo- 
code further describes the logic of the depth-test unit 2E0 and the interaction between 
17 depth-testing and stencil operations. 



21 



boolean Passes_ZJTest () { 
boolean Is_Valid; 



if <No_Z_Buf fer) { 

return TRUE; 
} else if (Z_Test_Enabled) { 
25 Set_Z_Buf f er_Pointer (pointer) ; 

destination - ^pointer; 
switch ( FunctionD BPTH ) { 

case LESS: Is_Valid = Z < destination; 

29 break; 



33 



case GREATER: Is_Valid = Z > destination; 
break ; 

case EQUAL: Is_Valid = (Z destination) ; 
break; 



37 case NEQUAL : Is_Valid = (Z>destination) | (Z<destination) ; 

break; 

case LEQUAL: Is_Valid = Z <= destination; 
41 break; 
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case GEQUAL: Is_Valid = (Z >= destination) ; 
break; 

5 case NEVER: Is_Valid = FALSE; 

break; 

case ALWAYS: 
9 otherwise: Is_Valid = TRUE; 

} 

return (Is_Valid); 
} else 

13 return TRUE; 

} 

Five state parameters affect the depth-related operations in the pixel block 
1 7 84B, namely, Z_Test_Enabled, Z JVrite_Enabled, No_Z_Buffer, Function OEPXH and 

No_Saved_Z_Buffer. An pixel-mode Cache_Fill packet supplies the current values of the 
Function OBPT „ , ZJTestJEnabled and Z_Write_Enabled state parameters, while the 
Begin_Frame packet supplies the current values of the No_Z_Buffer and 
2 1 No_Saved_Z_Buffer state parameters. 

The Z_Test_Enabled flag disables the comparison. With depth testing 
disabled, the unit 2E0 bypasses the depth comparison and any subsequent updates to the 
depth-buffer value and passes the fragment on to the next operation. The stencil value, 
25 however, is modified as if the depth test passed. 

Table 14 further describes the interaction of the four parameters, 
Z_Test_Enabled, Z_Write_Enabled, No_Z_Buffer and No_Saved_ZJBuffer. 
As mentioned elsewhere herein, the depth-buffer operations happen only if NoJZ_Buffer is 
29 FALSE. 

The depth test is a per-sample operation. In the aliased mode 
(Is_MultiSample is FALSE), the depth values are computed at the center of the fragment 
and assigned to each sample in the fragment. The cull block 846 appropriately generates 
33 the sample coverage mask so that, in the aliased mode, all samples are either on or off 
depending on whether the pixel center is included in the primitive or not. 
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1 

-- Z_VisibIe 

The pixel block 84B internally maintains a software-accessible register 2N0, 
the Z_Visible register 2N0. The block 84B clears this register 2N0 on encountering a 
5 Begin_Frame packet. The block 84B sets its value when it encounters the first visible 
sample of an object and clears it on read. 



- Blending 

9 Blending combines a sample's R, G, B and A values with the R, G, B and A 

values stored at the sample's location in the framebuffer 84G. The blended color is 
computed as: 

(Function BLEND ) (Source_ColorJFactor * Color SOURCt 
13 Destination_Color_Factor * Color DES71NAT10N ) 

where Function BLEND is a state parameter specifying what operation to apply to the two 
products, and Source_Color_Factor and Destination_Color_Factor are state parameters 
17 affecting the color-blending operation. (The sample is the "source" and the framebuffer the 
"destination.") 

Table 18 gives values in one embodiment for Function BLEND (x, y). The 
function options include addition, subtraction, reverse subtraction, minimum and 
21 maximum. 

Source_CoIor_Factor specifies the multiplicand for the sample color-value 
multiplication, while Destination_Color_Factor specifies the multiplicand for the 
framebuffer color-value multiplication. Table 17 gives values in one embodiment for the 
25 Source_Color_Factor and Destination_Color_Factor state parameters. (The subscript U S" 
and "D" terms in Table 17 are abbreviations for "SOURCE" and "DESTINATION." The 
"f term in Table 17 is an abbreviation for "MINIMUM (Aso URCEf 1 - A DESTfNAT10N ).") 

The color and alpha results are clamped in the range 
29 [0, 2 COLOR - VALUE - BIT - COUNT -1 ]. In one embodiment, color and alpha values are 8-bit values, 
and thus COLOR_VALUE_BIT_COUNT is 8. 

The BlendingJEnabled state parameter enables blending, and blending is 
enabled only in RGBA-color mode. The Blending_Enabled value cnmes from a nixel- 
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mode packet. 

The write_mask RGBA state parameter determines which bitplanes of the red, 
green, blue and alpha channels are updated. 

The No_CoIor_Buffer and No_Saved_Color_Buffer state parameters also 
affect the blending operation. Their current values are from a Begin_Frame packet. 

Table 15 illustrates the effect of these state parameters on blending in the 

pipeline. 

Alpha values are processed similarly. The Source_Alpha_Factor, 
Destination_Alpha_Factor and Function^^ state parameters control alpha blending. The 
Function ALPHA is similar to Functionco LOR , in one embodiment taking the same set of values. 
Source_AIpha_Factor specifies the multiplicand for the sample alpha-value multiplication, 
while Destination_Alpha_Factor specifies the multiplicand for the framebufFer alpha- value 
multiplication. Table 19 lists the possible values in one embodiment for 
Source_AIpha_Factor and Destination_Alpha_Factor. (The subscript "S" and "D" terms in 
Table 19 are abbreviations for "SOURCE" and "DESTINATION.") 

The sample buffer color and alpha are updated with the new values. The 
dirty bit for this sample is also set. 

The pipeline 840 generates colors and alphas on a per-fragment basis. For 
blending, the same source color and alpha apply to all covered samples within the fragment. 

Either the blend operation or the logical operations can be active at any 
given time but not both. Also, although OpenGL allows both logical operations and 
blending to be disabled, the practical effect is the same as if the source values are written 
into the destination. 

- Dithering 

The pipeline 840 incorporates dithering via three M x M dither matrices, 
Red_Dither, Green_Dither and Blue_Dither, corresponding to the dithering of each of the 
red, green and blue components, respectively. The low log 2 M bits of the pixel coordinate 
( x window> y window) index into each color-component dither matrix. The indexed matrix 
element is added to the blended color value. The computed red, green and blue values are 
truncated to the desired number of bits on output. 

(Dithering does not alter the alpha values.) 
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m_int Red_Dither [M , M] ; 
m_ int GreerMDither [M, M] ; 
5 m_int Blue_Dither [M, M] ; 

#define mask (M - 1) 

9 Xdithbr = ^wihdow & mask ; 

Ydither = Yhihixw & mask; 
red += Red_Dither [x DIXHER , Ydither]; 
green += Green_Dither [x^^, Ydither! / 
13 blue += Blue_Dither [XurxHER, y DITHBR ] ; 



The Dithering_Enabled state parameter enables the dithering of blended 
colors. Therefore, if blending is disabled, dithering is disabled as well. Since blending is 
17 disabled in color-index mode, dithering is also disabled in color-index mode. Table 20 
illustrates the effects of the Dithering_Enabled and Blending_Enabled flags. 

The specifics of one embodiment are as follow: The rendering pipeline 840 
has 8 bits for each color component. The output pixel formats may need to be dithered 
21 down to as little as 4 bits per color component. The matrices size M is then 4, and each 
matrix element is an unsigned 4-bit integer. 

In most cases, having one dither matrix applied to all color components may 
be adequate. However, in some cases, such as converting from RGB888 to RGB565 
25 formats, separate dither matrices for the red, green and blue channels may be desirable. For 
this reason, the pipeline 840 uses separate dither matrices for red, green and blue 
components. 

Four-bit elements suffice to dither the 8-bit color component values down to 
29 4 bits per color component. If the target pixel format has fewer bits per color channel, 
dither elements may need more bits. 

In one embodiment, the dither matrices are programmable with zero as the 
default value for all elements. (This disables dithering.) The responsibility then falls on the 
33 using software 8211 to appropriately load these matrices. 
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1 The described framework will suffice for most applications. Dithering is a 

per-fragment operation. 

- Logical Operations 

5 Like the blend unit 2F0, the logical-operations unit 2H0 computes a new 

color value based on the incoming value and the value stored in the framebuffer 84G. 
Logical operations for each color component value (red, green, blue and alpha) are 
independent of each other. Table 21 shows the available logical operations in one 
9 embodiment. (The "s" and "d" terms in Table 21 are abbreviations for "SOURCE" and 
"DESTINATION.") 

Logical operations are enabled if blending is disabled, that is to say, if 
Blending_Enabled is FALSE. Unlike blending, the logical operations may be invoked in 
13 color-index as well as RGBA mode, and the dithering does not apply if logical operations 
are enabled. 

- Tile Input and Output 

1 7 The pixel-out unit 280 prepares tiles for output by the back end 84C and for 

rendering by the pixel block 84B. In preparing tiles for output, the pixel-out unit 280 
performs sample-to-pixel resolution on the color, depth and stencil values, as well as pixel- 
format conversion as needed. In preparing tiles for rendering, the pixel-out unit 280 gets 
21 the pixel color, depth and stencil values from the back-end block 84C and does format 
conversion from the input pixel format (specified by the Pixel_Format state parameter) to 
the output pixel format (in one embodiment, RGBA8888) before the start of geometry 
rendering on the tiles. 

The pixel-out unit 280 also performs clears. 

FIG. 5 is a block diagram of the pixel-out unit 280. The pixel-out unit 280 
includes stencil-out, depth-out and color-out units 282, 284 and 286 receiving input from 
the sample stencil, depth and color buffers 211,212 and 2 JO, respectively. The stencil-out 
and depth-out units 282 and 284 both output to the per-pixel tile buffers 2K0. The color- 
out unit 286 outputs to a format converter 287 that itself outputs to the buffers 2K0. 

The pixel-out unit 280 also includes clear-stencil, clear-depth and clear-color 
units 281, 283 and 285, all receiving input from the tile buffers 2K0. The clear units 
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1 implement single-clock flash clear. The communication between clear units and the input 
units (for example the clear_stencil 281 and stencil-in unit 288) happens via a handshake. 
The clear-color unit 285 signals the format converter unit 28 A that itself outputs to a color- 
in unit 28B. The stencil-in, depth-in and color-in units 288, 289 and 28B output to the 

5 sample stencil, depth and color buffers 211, 212 and 2 JO, respectively. 

The stencil-out, depth-out and color-out blocks 282, 284 and 286 convert 
from sample values to, respectively, pixel stencil, depth and color values as described 
herein. The stencil-in, depth-in and color-in blocks 288, 289 and 28B convert from pixel to 

9 sample values. The format converters 287 and 28A convert between the output pixel 
format (RGBA8888, in one embodiment) and the input pixel format (specified by the 
Pixel_Format state parameter, in one embodiment.) 

13 -Tile Input 

A set of per-pixel tile staging buffers 2K0a, 2K0b, 2K0c, . . . , (generically 
and individually, 2K0ct, and, collectively, 2K0) exists between the pixel-out block 280 and 
the back-end block 84C. Each of these buffers 2K0 has three associated state bits (Empty, 

1 7 BackEnd_Done and Pixel_Done) that regulate (or simulate) the handshake between the 
pixel-out and back-end blocks 280, 84C for the use of these buffers 2K0. Both the back- 
end and the pixel-out units 84C, 280 maintain respective current input and output buffer 
pointers indicating the staging buffer 2K0a from which the respective unit is reading or to 

21 which the respective unit is writing. 

The pixel block 84B and the pixel-out unit 280 initiate and complete tile 
output using a handshake protocol. When rendering to a tile is completed, the pixel block 
84B signals the pixel-out unit 280 to output the tile. The pixel-out unit 280 sends color, z 

25 and stencil values to the pixel buffers 2K0 for transfer by the back end 84C to the 

framebuffer 84G. The framebuffer 84G stores the color and z values for each pixel, while 
the pixel block 84B maintains values for each sample. (Stencil values for both framebuffer 
84G and the pixel block 84B are stored identically.) The pixel-out unit 280 chooses which 

29 values to store in the framebuffer 84G. 

In preparing the tiles for rendering by the pixel block 84B, the back-end 
block 84C takes the next Empty buffer 2K0a (clearing its Empty bit), step 1105, and reads 
in the data from the framebuffer memory 84G as needed, as determined by its 
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1 Backend_Clear_Color, Backend_Clear_Depth and Backend_Clear_Stencii state parameters 
set by a BeginJTile packet, step 1110. (The back-end block 84C either reads into or clears 
a set of bitp lanes.) After the back-end block 84C finishes reading in the tile, it sets the 
BackEnd_Done bit, step 1115. 
5 The input filter 220 initiates tile preparation using a sequence of commands 

to the pixel-out unit 280. This command sequences is typically: BeginJTile, BeginJTile, 

BeginJTile Each BeginJTile signals the pixel-out unit 280 to find the next 

BackEnd_Done pixel buffer. The pixel-out unit 280 looks at the BackEnd_Done bit of the 
9 input tile buffer 2K0a, step 1205. If the BackEndJDone bit is not set, step 1210, the pixel- 
out unit 280 stalls, step 1220. Otherwise, it clears the BackEnd_Done bit, clears the color, 
depth and/or stencil bitplanes (as needed) in the pixel tile buffer 2K0a and appropriately 
transfers the pixel tile buffer 2KOoc to the tile sample buffers 211,212 and 2J0, step 1215. 
1 3 When done, the pixel block 240 marks the sample tile buffer as ready for rendering (sets the 
PixelJDone bit). 

- Tile Output 

17 On output, the pixel-out unit 280 resolves the samples in the rendered tile 

into pixels in the pixel tile buffers 2K0. The pixel-out unit 280 traverses the pixel buffers 
2K0 in order and emits a rendered sample tile to the same pixel buffer 2K0a whence it 
came. After completing the tile output to the pixel tile buffer 2K0a, the pixel-out unit 280 

2 1 sets the Pixel_Done bit. 

On observing a set Pixel_Done bit, step 1125, the back-end block 84C sets 
its current input pointer to the associated pixel tile buffer 2K0a, clears the Pixel_Done bit 
(step 1 130) and transfers the tile buffer 2KOoc to the framebuffer memory 84G. After 

25 completing the transfer, the back-end block 84C sets the Empty bit on the buffer 2K0a, 
step 1135. 

— Depth Output 

29 The pixel-out unit 280 sends depth values to the pixel buffer 2K0a if the 

corresponding Begin_Frame packet has cleared the No_Saved_Depth_Buffer state 
parameter. The Depth_Output_Selection state parameter determines the selection of the 
sample's z value. The following pseudo-code illustrates the effect of the 
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1 Depth_Output_S election state parameter: 



int SAMPLES_PER_PIXEL = 4; 

int sorted_sample_depths (SAMPLES_PER_PIXEL] ; 

5 

if (Depth_Output_Selection == FIRST) { 
/* first sample */ 
Sample_to_Output « 0; 
9 } else { 

/* sort sample depths into sorted_sample_depths [] */ 
Order_Sample_Depth_Values ( ) ; 
Sample_to_Output = sorted_sample_depths [ 
13 (Depth_Output_Selection == NEAREST) ? 

0 : SAMPLES_PER_PIXEL - 1] ; 

} 

17 —Color Output 

The pixel block 84B sends color values to the pixel buffers 2K0 if the 
corresponding Begin_Frame packet has cleared the No_Saved_Color_Buffer state 
parameter. The color value output depends on the setting of the Overflow_Frame, 

21 Color_Output_Selection and Color_Output_Overflow_Selected state parameters. The 
following pseudo-code outlines the logic for processing colors on output: 



int SAMPLES PER PIXEL = 4; 



25 



color_selected = (Overf low_Frame) ? 

Color_Output_Overf low_Selected : 

Co 1 or_Ou tpu t _S e 1 e c t i on ; 

29 switch (color_selected) { 
case WEIGHTED: 

color PlxEL = Compute_Weighted_Average ( ) ; 



33 



37 



break; 

case FIRST; 

color PIXBL = f irst_Sample_Color; 
break; 



case DIRTY: 

f color = (0,0,0); 
numb e r_o f _s amp 1 e s = 0; 
41 for (co\int = 0; count < SAMPLES_PER_PIXEL; count++) { 

if (Sample_Is_Dirty) { 

f color += sampleSrcColor; 
number_of _samples++ ; 

45 } 
} 
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1 if {mimber_of _samp3.es > 0) 

color PIIEL = f color/number_of _samp3.es; 
break; 

5 case MAJORITY: 

numFgnd = numBgnd = 0; 

f color = bcolor = (0, 0, 0) ; 

for (count = 0; count < SAMPLES_PER_PIXEL; count++) { 
9 if (Sample_IsJDirty) { 

numFgnd++; 

f color += sample_Source_Color; 
} else { 

13 numBgnd++; 

bcolor += sample_Buffer_Color; 

} 

} 

17 color = (numFgnd >= numBgnd)? f color /numFgnd : 

bcolor/numBgnd; 

break ; 



21 



} 



This computed color is assigned to the pixel. 

For some options, like DIRTYJ3AMPLES, the color may not be blended 

between passes. This may cause some aliasing artifacts but prevents the worse artifacts of 
25 background colors bleeding through at abutting polygon edges in the case of an overflow of 

the polygon or sort memory. In any case, the application 8211 has substantial control over 

combining the color samples prior to output. 

The sample weights used in computation of the weighted average are 
29 programmable. They are 8-bit quantities in one embodiment. These eight bit quantities 

are represented as 1.7 numbers (i.e. 1 integer bit followed by 7 fraction bits in fixed point 

format). This allows specification of each of the weights to be in the range 0.0 to a little 

less than 2.0. For uniform weighting of 4 samples in the pixel, the specified weight for each 
33 sample should be 32. The weight of the samples will thus add up to 128, which is equal to 

1.0 in the fixed point format used in the embodiment. 

— Stencil Output 

37 The pixel-out unit 280 sends stencil values to the pixel buffer 2K0 if the 

No_Saved_Stencil_Buffer flag is not set in the corresponding Begin_Frame packet. The 
stencil values may need to be passed from one frame to the next and used in frame clearing 
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1 operations. Because of this, keeping sample-level precision for stencils may be necessary. 
(The application 8211 may choose to use either 8 bits per-pixel or 2 bits per-sample for 
each stencil value). The Stencil_Mode bit in a Begin_Frame determines if the stencil is 
per-pixel or per-sample. In either case, the sample-level-precision bits (8, in one 

5 embodiment) of stencil information per pixel are sent out. 

— Pixel-Format Conversion 

Pixel format conversion happens both at tile output and at tile preparation for 
9 rendering. Left or right shifting the pixel color and alpha components by the appropriate 
amount converts the pipeline format RGBA8888 to the target format (herein, one of 
ARGB8888, RGB565 and INDEX8). 
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Table 1: Begin_Frame - i Prefetch J$egin_Fraine Packets 



Data Item 


Bits / Item 


Source 


Destination 


Header 


5 


MD 




B lockingjntemipt 


1 


SW 


BKJE 


WinSourceL 


8 


sw 


BKE 


WinSourceR 


8 


SW 


BKE 


WinTargetL 


8 


sw 


BKE 


WinTargetR 


8 


sw 


BKE 


Window_X_OfFset 


8 


sw 


BKE 


Window_Y_Offset 


12 


sw 


BKE 


Pixel_Format 


2 


sw 


PDCBKE 


SrcEqTarL 


1 


sw 


SRT,BKE 


SrcEqTarR 


1 


sw 


SRT3KE 


No_Color_Buffer 


1 


sw 


PIX, BKE 


No_Saved_CoIor_Buffer 


I 


sw 


PIX, BKE 


No_Z_BufFer 


1 


sw 


PIX, BKE 


No_Saved_Z_Buffer 


1 


sw 


PIX, BKE 


No_Stencil_Buffer 


1 


sw 


PIX, BKE 


No_Saved_Stencil_Buffer 


1 


sw 


PIX, BKE 


StenciI_Mode 


1 


sw 


PEX 


Depth_Output_Selection 


2 


sw 


PIX 


CoIor_OutputJSelection 


2 


sw 


PIX 


Color_Output_Overflow_S election 


2 


sw 


PIX 


VerticalJPixel_Count 


11 


sw 


BKE 


StencilFirst 


1 


sw 


| PIX 


Total Bits 


87 
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Table 2: End_Frame z~ A Prefetch_End_Frame Packets 



Data Item 


Bits / Item 


Source 


Destination 


Header 


5 


MU 




Interrupt_Number 


6 


SW 


BKE 


SoftJEndJrame 


1 


SW 


MEX 


BufTer_Over_Occurred 


1 


MEX 


SRT, PDC 


Total Bits 


13 
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Table 3: VSP Packet 



Data Item 


Bits 


Description 


Header 


5 




Mode Cache Index 


4 


Index of mode information in mode cache. 


Stipple_Cache_Index 


2 


Index of stipple information in stipple cache. 


Stamp_X_Index 


3 


X-wise index of stamp in tile. 


Stamp_Y_Index 


3 


Y-wise index of stamp in tile. 


Sample_Coverage_Mask 


16 


Mask of visible samples in stamp. 


Preference 


32 


The reference value with respect to which all Z 
reference values are computed. 


dZdX 


28 


Partial derivative of z along the x direction. 


dZdY 


28 


Partial derivative of z along the y direction. 


Is_MultiSample 


1 


Flag indicating anti-aliased or non-anti-aliased 
rendering. 


Total Bits 


122 
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Table 4: Clear Packet 



Data Item 


Bits/ 
Item 


Source 


Destination 


Header 


5 


SW 


PIX 


Mode_Cache_Index 


4 


MU 


PIX 


Clear_CoIor 


I 


SW 


PIX 


ClearJDepth 


1 


SW 


PIX 


Clear_Stencil . 


1 


SW 


PIX 


Clear_Color_VaIue 


32 


SW 


PIX 


Clear_Depth_Value 


24 


SW 


PIX 


Clear_Stencil_Value 


8 


SW 


PIX 


Total Bits 


75 
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Table 5: Tile_Begin an^ Prefetch JTile_Begin Packets 



Data Item 


Bits /Item 


Header 


5 


First_Tile_In_Frame 


1 


Breakpoint_Tile 


1 


Tile_Right 


1 


Tiiejront 


1 


Tile_X_Location 


7 


Tile_Y_Location 


7 


Tile_Repeat 


I 


Tile_Begin_SubFrame 


1 


Begin_SuperTile 


1 


Overflow_Frame 


1 


Write_Tile_ZS 


1 


Backend_CIear_Color 


I 


Backend_Clear_Depth 


1 


Backend_ClearJ5tencil 


1 


Clear_Color_Value 


32 


Clear_Depth_Value 


24 


Clear_Steiicil_Value 


8 


Total Bits 


95 
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Table 6: Pixel-Mode C-Jie_Fill Packet (Part 1 of 2) 



Data Item 


Bits 


Description 




Header 


5 






Mode_CacheJndex 


4 


Index of the cache entry to replace. 


Scissor_Test_EnabIed 


1 


Scissor test enable flag. 


X$ciuor_Min 


11 


Scissor window definition: x^ 


x Setuor_M« 


11 


Scissor window definition: 


ySctsor_Mtn 


11 


Scissor window definition: 


ySctsxor^Mu 


11 


Scissor window definition: x^ 


Stipp!e_Test_Enabled 


1 


Stipple test enable flag. 


FunctionALpHx 


3 


Function for the alpha test. 


alphaREFERENCE 


8 


Reference value used in alpha test. 


Aipha_Test_Enabled 


1 


Alpha test enable flag. 


FunctioncoLOR 


3 


Color-test function. 


color MIN 


24 


Minimum inclusive value of the color key. 




24 


Maximum inclusive value for the color key. 


Color_Test_Enabled 


1 


Color test enable flag. 


5 iciua t REFERENCE 


8 


Reference value used in The stencil test. 


Functional- 

r u i icuu u stencil 


3 


Stencil-test function. 


Fanction DEPTH 


3 


Depth-test function. 


maS ^STENCIL 


8 


Stencil mask to AND the reference and buffer sample 
stencil values prior to testing. 


Stencil_Test_Failure_ 
Operation 


4 


Action to take on failure of the stencil test. 


Stencil_Test_Pass_Z_Test 
_FaiIure_Operation 


4 


Action to take on passage of the stencil test and 
failure of the depth test. 


Stencil_and_Z_Tests_Pass 
_Operation 


4 


Action to take on passage of both the stencil and 
depth tests. 


Stencil_Test_Enabled 


1 


Stencil test enable flag. 


write jnasksjENCL 


8 


Stencil mask for the stencil bits in the buffer that are 
updated. 
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Table 7: Pixel-Mode 0~he_FilI Packet (Part 2 of 2) 



Data Item 


Bits 


Description 


Z Test Enabled 


1 


Denth test enable flap 


Z Write Enabled 


1 


Deoth write enable fiat? 


DrawStencil 


1 


Flag to interpret the second data value from the 
Phone block 84 A as stencil data 


write^maskcoLOR 


32 


Mask of bitplanes in the draw buffer that are enabled. 
(In color-index mode, the low-order 8 bits are the 
IndexMask.) 


B 1 ending_Enab 1 ed 


1 


Flac indicating that blendine is enabled 


Constant Color BLEND 


32 


Constant color for blending. 


Source^ Color Factor 


4 


Multiplier for source-derived sample colors. 


Destination_Color_Factor 


4 


Multiplier for destination-derived sample colors. 


Source_Alpha_Factor 


3 


Multiplier for sample alpha values. 


Destination_Alpha_Factor 


3 


Multiplier for sample alpha values already in the tile 
buffer. 


Color_LogicBlend_Operati 
on 


4 


Logic or blend operation for color values. 


Alpha_LogicBlend_Operat 
ion 


4 


Logic or blend operation for alpha values. 


Dithering_Enabled 


1 


Dither test enable flag. 


TOTAL 


253 
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Table 8: Color Packet 



Data Item 


Bits 


Description 


Header 


1 




Color 


32 


RGBA data. 


TOTAL 


33 
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Table 9: Depth Packet 



Data Item 


Bits 


Description 


Header 


1 




Z 


32 


Fragment stencil or depth data. 


TOTAL 


33 
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Table 10: Stipple Cac>~- Fill Packet 



Data Item 


Bits 


Description 


Header 


1 




Stipple_Cache_Index 


2 


Index of cache entry to replace. 


Stipple_Pattern 


1024 


Stipple pattern. 


TOTAL 


1031 
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Table 11: Alpha-Test F- -actions 



FtmttloaALPHA 


Value 


Comparison 


LESS 


0x1 


(A < alpha^^) 


LEQUAL 


0x3 


(A <= aIpha Rcfnenec ) 


EQUAL 


0x2 


(A = =alpha RcfCTencc ) 


NEQUAL 


0x5 


(A Nalpha^fe^) 


GEQUAL 


0x6 


{A s^alpha^f^ 


GREATER 


0x4 


(A > alpha^^) 


ALWAYS 


0x7 


(TRUE) 


NEVER 


0x0 


(FALSE) 
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Table 12: Color-Test r notions 



FunctioncoLo 


Value 


Comparison 


LESS 


0x1 


(C<color MIN ) 


LEQUAL 


0x3 


(C^coiorMAx) 


EQUAL 


0x2 


(C >= color*™) & (C =< 
colorwAx) 


NEQUAL 


0x5 


(C< color^) 1 (C >coior MAX ) 


GEQUAL 


0x6 


(C^colorM^) 


GREATER 


0x4 


(C>color MAX ) 


ALWAYS 


0x7 


TRUE 


NEVER 


0x0 


FALSE 
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Table 13: Stencil Oper 'mis 



Operation 


Value 


Action 


KEEP 


0x0 


Keep stored value 


ZERO 


0x1 


Set value to zero 


MAX_VAL 


0x2 


Set to the maximum allowed. For pipeline 840 
maximum stencil value is 255 in the per-pixel mode 
and 3 in the per-sample mode. 


REPLACE 


0x3 


Replace stored value with reference value 


INCR 


0x4 


Increment stored value 


DECR 


0x5 


Decrement stored value 


INCRSAT 


0x6 


Increment stored value, clamp to max on overflow. 
This is equivalent to the INCR operation in OpenGL. 


DECRSAT 


0x7 


Decrement stored value; clamp to 0 on underflow. This 
is equivalent to the DECR operation in OpenGL. 


INVERT 


0x8 


Bitwise invert stored value 
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Table 17: Color Blend Factors 



Value 


Encoding 


Blend Factors 


ZERO 


0x8 


(0,0,0) 


ONE 


0x0 


(1.1.1) 


SOURCE COLOR 


0x1 




ONE_MINUS_SOURCE_COLOR 


0x9 


(l.l,l)-flfcG» 


DESTINATION COLOR 


0x3 




ONE MINUS DESTINATION COLOR 


OxB 


(i, 1, l)-(*i> G**^ 


SOURCE ALPHA 


0x4 


(As, As, A s ) 


ONE MINUS SOURCE ALPHA 


OxC 


(l.l.O-M*^^ 


DESTINATION ALPHA 


0x6 


(Ad, A& A d ) 


ONE MINUS DESTINATION ALPHA 


OxE 


(\AA)-(A d .Ao,Ao) 


SOURCE_ALPHA_SATURATE 


OxF 


ttf.J) 


CONSTANT_COLOR 


0x2 


(Ro G 0 B c ) 


ONE MINUS CONSTANT COLOR 


OxA 


(\,\,\)-(R 0 G 0 B c ) 


CONSTANT_ALPHA 


0x5 


(A& Ac, Ac) 


ONE MINUS CONSTANT ALPHA 


OxD 


(l,l,l)-(A c ,A a A c ) 
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Table 18: Function BLF Values 



Value 


Encoding 


Operation 


ADD (x,y) 


0x0 


x + y 


SUBTRACT (x,y) 


0x1 


x-y 


REVERSE_SUBTRACT (x, y) 


0x2 


y-x 


MINIMUM (x,y) 


0x3 


minimum(x, y) 


MAXIMUM (x,y) 


0x4 


maximum(x, y) 
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Table 19: Source and T -tination Alpha Blend Factors 



Value 


Encoding 


Blend Factors 


ZERO 


0x4 


(0,0,0,0) 


ONE 


0x0 


(1.1.1.1) 


SOURCE ALPHA 


0x1 




ONE MINUS SOURCE ALPHA 


0x5 


U-AJ 


DESTINATION ALPHA 


0x3 


A d 


ONE_MINUS_DESTINATION_ ALPHA 


0x7 


(I -A s ) 


CONSTANT ALPHA 


0x2 


A e 


ONE MINUS CONSTANT ALPHA 


0x6 


d-A e ) 
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Table 20: Effects of B'-ndingJinabled and Dithering^Enabled State P-^ameteta;.. 



Blending_Enabled 


Dithering_Enabled 


Operation 


TRUE 


TRUE 


Blending and dithering are enabled. Logical operations 
are disabled. 


TRUE 


FALSE 


Blending is enabled. Dithering and logical operations 
are disabled. 


FALSE 


TRUE 


Blending and dithering are disabled. Logical operations 
are enabled. 


FALSE 


FALSE 


Blending and dithering are disabled. Logical operations 
are enabled. 
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Table 21 : Logical Oper * ; ons 



Value 


Encoding 


Operation 


CLEAR 


0x0 


0 


COPY 


0x3 


s 


NOOP 


0x5 


d 


SET 


Oxf 


all l's 


AND 


0x1 


sA d 


AND REVERSE 


0x2 


sA^d 


AND INVERTED 


0x4 


sAd 


XOR 


0x6 


sxord 


OR 


0x7 




NOR 


0x8 


-(sVd) 


EQUTVAENT 


0x9 


-is xor d) 


INVERT 


Oxa 




OR REVERSE 


Oxb 


sV^d 


COPY INVERTED 


Oxc 




OR INVERTED 


Oxd 


s\Jd 


NAND 


Oxe 


-is Ad) 
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Table 22: State Parair 
(Part 1 of 2) 



TS 



Parameter 



Stipple_Pattern 



Pixel Format 



No Saved Stencil Buffer 



No Stencil Buffer 



No Z Buffer 



No Saved Z Buffer 



No Color Buffer 



No Saved Color Buffer 



Color_Output_Selection 



Color_Qutput_Overflow_Selection 
DrawStencil 



SampleLocations 



SampleWeights 



Depth_Output_j>election 



Stencil Mode 



Tile X Location 



Tile Y Location 



Clear Color Value 



Clear_Depth_Value 



Clear Stencil Value 



DepthClearMask 



write_mask STENCIL 



Overflow Frame 



EnableFlags 



Is_MultiSample 



write_mask RGBA 



Function* 



alpha Refi 



Table 23: State P- 



meters (Part 2 of 2) 



FunctioncotoR 



Constant_CoIor BL£ND 



color M 



color M 



Function DER rH 



FunctionyrEKaL 



Stencil_Test_Failed_Operation 



Stencil_Test_Passed_Z_Test_Failed_OperaUon 
Stencil_andJZJTests_Passed_Operation 



Source Color Factor 



Destinaton Color Factor 



Color_LogicBlend_Operation 



Source_Alpha_Factor 



Destination_A]pha_Factor 



stencil, 



REFERENCE 



maskci 



^Setoff Mio 



y Scissor Mln 



y San or Max 



84 



WO 00/11605 



PCT/US99/19363 



WHAT IS CLAIMED IS : 



1 1. A method for rendering a graphics image, said method comprising: 

2 performing a fragment operation on a fragment on a per-pixel basis; and 

3 performing a fragment operation on said fragment on a per-sample basis. 

1 2. The method of claim 1, wherein said step of performing on a per- 

2 pixel basis comprises 

3 performing one of the following fragment operations on a per-pixel basis: 

4 scissor test, stipple test, alpha test, color test. 

1 3. The method of claim 1, wherein said step of performing on a per- 

2 sample basis comprises 

3 performing one of the following fragment operations on a per-sample basis: 

4 Z test, blending, dithering. 

1 4. The method of claim 1 , further comprising the step of: 

2 programmatically selecting whether to perform a stencil test on a per-pixel 

3 or a per-sample basis, and 

4 wherein between said steps, the following step is performed: 

5 performing said stencil test on said selected basis. 

1 5. The method of claim 1, wherein said step of performing on a per- 

2 sample basis comprises 

3 programmatically selecting a set of subdivisions of a pixel as samples for use 

4 in said fragment operation on a per-sample basis, and 

5 wherein said method further comprises 

6 then programmatically selecting a different set of subdivisions of a pixel as 

7 samples for use in a second fragment operation on a per-sample basis; and 

8 then performing said second fragment operation on a fragment on a per- 

9 sample basis, using said programmatically selected samples. 
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1 6. The method of claim 1 , wherein said step of performing on a per- 

2 sample basis comprises 

3 programmatically selecting a set of subdivisions of a pixel as samples for use 

4 in said fragment operation on a per-sample basis; 

5 programmatically assigning different weights to two samples in said set; and 

6 performing said fragment operation on said fragment on a per-sample basis, 

7 using said programmatically selected and differently weighted samples. 

1 7. A method for rendering a graphics image, said method comprising: 

2 performing one of the following fragment operations on a fragment on a per- 

3 pixel basis: scissor test, stipple test, alpha test, color test; 

4 programmatically selecting whether to perform a stencil test on a per-pixel 

5 or a per-sample basis, and 

6 performing said stencil test on said selected basis; and 

7 programmatically selecting a set of subdivisions of a pixel as samples for use 

8 in a fragment operation on a per-sample basis; 

9 programmatically assigning different weights to two samples in said set; and 

10 performing one of the following fragment operations on a per-sample basis, 

1 1 using said programmatically selected and differently weighted samples: Z test, 

12 blending, dithering; 

13 then programmatically selecting a different set of subdivisions of a pixel as 

14 samples for use in a second fragment operation on a per-sample basis; and 

1 5 then performing said second fragment operation on a fragment on a per- 

1 6 sample basis, using said programmatically selected samples. 

1 8. A method for rendering a graphics image, said method comprising: 

2 programmatically selecting whether to perform a stencil test on a per-pixel 

3 or a per-sample basis, and 

4 performing said stencil test on said selected basis. 
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1 9. A computer-readable medium for data storage wherein is located a 

2 computer program for causing a graphics-rendering system to render an image by 

3 performing a fragment operation on a fragment on a per-pixel basis; and 

4 performing a fragment operation on said fragment on a per-sample basis. 

1 1(L A computer-readable medium for data storage wherein is located a 

2 computer program for causing a graphics-rendering system to render an image by 

3 performing one of the following fragment operations on a fragment on a per- 

4 pixel basis: scissor test, stipple test, alpha test, color test; 

5 programmatically selecting whether to perform a stencil test on a per-pixel 

6 or a per-sample basis, and 

7 performing said stencil test on said selected basis; and 

8 programmatically selecting a set of subdivisions of a pixel as samples for use 

9 in a fragment operation on a per-sample basis, 

10 performing one of the following fragment operations on a per-sample basis, 

1 1 using said programmatically selected samples: Z test, blending, dithering; 

12 then programmatically selecting a different set of subdivisions of a pixel as 

13 samples for use in a second fragment operation on a per-sample basis; and 

14 then performing said second fragment operation on a fragment on a per- 

15 sample basis, using said programmatically selected samples. 

1 1 1 . A computer-readable medium for data storage wherein is located a 

2 computer program for causing a graphics-rendering system to render an image by 

3 programmatically selecting whether to perform a stencil test on a per-pixel 

4 or a per-sample basis, and 

5 performing said stencil test on said selected basis. 

1 1 2. A system for rendering graphics images, said system comprising: 

2 a port for receiving commands from a graphics application; 

3 an output for sending a rendered image to a display; and 

4 a fragment-operations pipeline, coupled to said port and to said output, said 
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1 fragment-operations pipeline comprising 

2 a stage for performing a fragment operation on a fragment on 

3 a per-pixel basis; and 

4 a stage for performing a fragment operation on said fragment 

5 on a per-sample basis. 

1 13. The apparatus of claim 1 2, wherein said stage for performing on a 

2 per-pixel basis comprises one of the following: a scissor-test stage, a stipple- test stage, an 

3 alpha-test stage, a color-test stage. 

1 14. The apparatus of claim 1 2, wherein said stage for performing on a 

2 per-pixel basis comprises one of the following: a Z-test stage, a blending stage, a dithering 

3 stage. 

1 15. A system for rendering graphics images, said system comprising: 

2 a port for receiving commands from a graphics application; 

3 an output for sending a rendered image to a display; 

4 the medium of claim 1 1 ; and 

5 a CPU, coupled to said port, said output and said medium, for executing said 

6 computer program in said medium. 
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